Why AI Resume Scoring Matters

A typical engineering role at a mid-sized company generates 300–800 applicants. A senior recruiter, spending 30 seconds per resume, takes 2–6 hours just to produce a shortlist. That's before any actual evaluation happens.

The bigger problem isn't speed — it's consistency. Human reviewers are influenced by recency bias, formatting preferences, and credential anchoring (company names as proxies for ability). A Stanford grad with 2 years of experience gets through; a self-taught engineer with 7 years of open-source contributions doesn't. The signal-to-noise ratio is terrible.

Traditional ATS keyword matching doesn't fix this. It filters for resume optimization skills, not engineering skills. You end up rejecting candidates who know how to do the job but don't know to write "TypeScript" instead of "TS".

What you actually want is a system that reads resumes the way a senior engineer would: understanding that "built a distributed cache with 99.9% uptime" tells you more than "proficient in Redis", and that breadth of contribution in open source matters as much as employer names.

Claude is well-suited for this. It reasons about context, understands technical depth versus surface familiarity, and can follow structured evaluation criteria. The trick is building an API around it that's reliable, fast, and integrated with your existing hiring stack.

Architecture Overview

The stack is intentionally minimal: Express.js for the API layer, Claude (via Anthropic's API) for reasoning, and PostgreSQL for caching results and tracking usage. No ML infrastructure, no model training, no GPU costs.

1
API Request
Client sends POST /api/v1/score-resume with resume text, job description, and required skills in JSON.
2
Auth & Rate Limiting
API key validated against PostgreSQL api_keys table. Rate limit enforced per key (100 req/day on free tier).
3
Cache Check
SHA-256 hash of resume + job description checked against score_cache. Cache hit returns immediately without calling Claude.
4
Claude Evaluation
Resume + job description + evaluation criteria sent to Claude with a structured prompt. Response is parsed as JSON.
5
Store & Return
Result cached in PostgreSQL, usage logged, structured score returned to caller in <3 seconds.

The key design decision is determinism via caching. The same resume against the same job description always returns the same score — no drift between calls. This matters when you're comparing candidates against each other or auditing hiring decisions.

The Scoring Endpoint

The Evaluation Prompt

The hardest part of building an AI resume scoring API isn't the Express route — it's the prompt. You need Claude to return consistent, structured JSON every time, with scores that are calibrated and defensible.

Here's the core evaluation prompt structure:

JavaScript
function buildScoringPrompt(resume, jobDescription, requiredSkills) { return `You are a senior engineering hiring manager evaluating a candidate. Score the following resume for this role. Return ONLY valid JSON, no prose. ## Job Description ${jobDescription} ## Required Skills ${requiredSkills.join(', ')} ## Resume ${resume} ## Scoring Instructions Evaluate on these dimensions (0-100 each): - technical_depth: Evidence of deep technical knowledge, not just familiarity - relevant_experience: Direct experience with the required stack and domain - problem_complexity: Scale and complexity of problems the candidate has solved - leadership_signal: Code reviews, mentoring, architectural decisions (0 if N/A) - growth_trajectory: Rate of skill development and responsibility increase ## Required Output Format { "fit_score": , "strengths": [<2-4 specific evidence-backed strengths>], "gaps": [<1-3 specific gaps relative to the role, or empty array>], "skill_matches": {: <"strong"|"partial"|"missing"> for each required skill}, "dimension_scores": {: <0-100>}, "hire_signal": <"strong_yes"|"yes"|"maybe"|"no">, "summary": }`; }
Why structured output matters

Instructing Claude to return only valid JSON (with no prose) is critical. Claude models follow this reliably when you're explicit. We also use JSON.parse() with a try/catch and a retry for malformed responses — in practice, malformed output happens less than 0.5% of the time with claude-3-5-sonnet.

The Express Route

JavaScript
const crypto = require('crypto'); const Anthropic = require('@anthropic-ai/sdk'); const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY }); app.post('/api/v1/score-resume', authenticateApiKey, async (req, res) => { const { resume, job_description, required_skills = [] } = req.body; // Input validation if (!resume || !job_description) { return res.status(400).json({ success: false, error: 'MISSING_FIELDS', message: 'resume and job_description are required' }); } // Cache key: hash of resume + job description content const cacheKey = crypto .createHash('sha256') .update(resume + job_description + required_skills.join(',')) .digest('hex'); // Check cache first const cached = await pool.query( 'SELECT result FROM score_cache WHERE cache_key = $1', [cacheKey] ); if (cached.rowCount > 0) { return res.json({ success: true, cached: true, ...cached.rows[0].result }); } const startMs = Date.now(); // Call Claude for evaluation const message = await anthropic.messages.create({ model: 'claude-3-5-sonnet-20241022', max_tokens: 1024, messages: [{ role: 'user', content: buildScoringPrompt(resume, job_description, required_skills) }] }); // Parse JSON from Claude's response const rawText = message.content[0].text; const evaluation = JSON.parse(rawText); const result = { ...evaluation, processing_time_ms: Date.now() - startMs }; // Cache the result await pool.query( `INSERT INTO score_cache (cache_key, result, created_at) VALUES ($1, $2, NOW()) ON CONFLICT (cache_key) DO NOTHING`, [cacheKey, JSON.stringify(result)] ); res.json({ success: true, cached: false, ...result }); });

API Response Format

Here's what a real response looks like for a mid-level backend engineer applying for a Node.js role:

{
  "success": true,
  "fit_score": 78,
  "hire_signal": "yes",
  "summary": "Strong backend generalist with deep Node.js experience; gaps in distributed systems at scale.",
  "strengths": [
    "4 years of production Node.js with demonstrated performance optimization (50ms → 8ms API response)",
    "Owned PostgreSQL schema design and migration strategy for 3 products",
    "Open source contributor: 340 GitHub stars on a REST framework utility"
  ],
  "gaps": [
    "No evidence of Kubernetes or container orchestration at scale",
    "Limited distributed systems design experience beyond single-node services"
  ],
  "skill_matches": {
    "Node.js": "strong",
    "PostgreSQL": "strong",
    "Kubernetes": "missing",
    "Redis": "partial"
  },
  "dimension_scores": {
    "technical_depth": 82,
    "relevant_experience": 79,
    "problem_complexity": 68,
    "leadership_signal": 71,
    "growth_trajectory": 84
  },
  "processing_time_ms": 2341,
  "cached": false
}

Database Schema

Two tables support the scoring API: api_keys for authentication and rate limiting, and score_cache for deterministic result caching.

SQL
-- API keys with usage tracking CREATE TABLE api_keys ( id SERIAL PRIMARY KEY, key_hash VARCHAR(64) UNIQUE NOT NULL, -- SHA-256 of raw key name VARCHAR(255), -- e.g. "Acme Corp Production" email VARCHAR(255), tier VARCHAR(20) DEFAULT 'free', -- free | pro daily_limit INTEGER DEFAULT 100, created_at TIMESTAMP DEFAULT NOW() ); -- Cached scores for determinism CREATE TABLE score_cache ( id SERIAL PRIMARY KEY, cache_key VARCHAR(64) UNIQUE NOT NULL, -- SHA-256 hash result JSONB NOT NULL, created_at TIMESTAMP DEFAULT NOW() ); -- Rate limiting: count calls per key per day CREATE TABLE api_usage_logs ( id SERIAL PRIMARY KEY, api_key_id INTEGER REFERENCES api_keys(id), endpoint VARCHAR(100), status_code INTEGER, response_time_ms INTEGER, created_at TIMESTAMP DEFAULT NOW() ); CREATE INDEX idx_usage_key_day ON api_usage_logs (api_key_id, date_trunc('day', created_at));

Rate Limiting Without Redis

For a scoring API, you don't need Redis for rate limiting. A PostgreSQL query counting today's calls is fast enough — scoring calls are expensive (2–3 seconds each), so callers won't be hammering the endpoint at high enough frequency for a DB query to be a bottleneck.

JavaScript
async function checkRateLimit(apiKeyId, dailyLimit) { const { rows } = await pool.query(` SELECT COUNT(*)::int AS calls_today FROM api_usage_logs WHERE api_key_id = $1 AND created_at >= date_trunc('day', NOW()) `, [apiKeyId]); const callsToday = rows[0].calls_today; if (callsToday >= dailyLimit) { throw { status: 429, code: 'RATE_LIMIT_EXCEEDED', message: `Daily limit of ${dailyLimit} calls reached. Resets at midnight UTC.` }; } return { callsToday, remaining: dailyLimit - callsToday }; }

Calling the API

Here's a complete curl example you can run right now against the live API:

Shell
curl -X POST https://stackwright.polsia.app/api/v1/score-resume \ -H "Content-Type: application/json" \ -H "X-API-Key: sk-sw-demo-stackwright2025" \ -d '{ "resume": "John Smith\n5 years backend engineering at Stripe and Plaid. Node.js, PostgreSQL, Redis, AWS Lambda. Led migration of payment processing service from monolith to microservices. Open source: github.com/jsmith/pg-migrate-cli (2.1k stars).", "job_description": "Senior backend engineer for fintech API team. Must have production Node.js experience, PostgreSQL, and payment processing domain knowledge.", "required_skills": ["Node.js", "PostgreSQL", "Redis", "payment processing"] }'
Demo key included

The key sk-sw-demo-stackwright2025 works on the live API at a rate of 10 calls/day. No signup required for initial testing.

Production Considerations

Cost per API call

A typical score-resume call sends ~800 tokens (resume + job description + prompt) and receives ~400 tokens (the JSON evaluation). Using claude-3-5-sonnet-20241022:

  • Input: 800 tokens × $3.00/MTok = ~$0.0024
  • Output: 400 tokens × $15.00/MTok = ~$0.006
  • Total: ~$0.008 per call (~1 cent)

With caching, repeat evaluations of the same resume against the same role are free. In a typical hiring workflow, you'll see 20–40% cache hit rates when multiple team members are reviewing the same candidates.

Handling malformed Claude output

Claude returns valid JSON >99.5% of the time when you're explicit. For the rare failure, implement a retry with a slightly different prompt and a fallback error response:

JavaScript
async function callClaudeWithRetry(prompt, maxRetries = 2) { for (let attempt = 0; attempt < maxRetries; attempt++) { const message = await anthropic.messages.create({ model: 'claude-3-5-sonnet-20241022', max_tokens: 1024, messages: [{ role: 'user', content: prompt }] }); try { return JSON.parse(message.content[0].text); } catch (e) { if (attempt === maxRetries - 1) throw new Error('Claude returned non-JSON response'); // Wait 500ms before retry await new Promise(r => setTimeout(r, 500)); } } }

ATS Integration

The most common integration pattern is a webhook receiver: your ATS posts a new-applicant event, your server calls /api/v1/score-resume, and posts the score back to the ATS candidate record. Ashby, Greenhouse, and Lever all support inbound webhooks for this pattern.

Compliance note

If you're using AI scoring in hiring decisions, document your evaluation criteria and provide candidates with a path to appeal. EEOC guidelines apply to algorithmic screening the same as human screening. A score is a signal, not a decision.

Try It Live

Stackwright is the production implementation of everything described in this article. The API is live, the documentation has a browser-based test console, and the demo key below gives you 10 calls to try it right now.

Try the live API

Paste a real resume, a job description, and a list of required skills. Get a structured score back in under 3 seconds.

X-API-Key: sk-sw-demo-stackwright2025

What's Next

The scoring endpoint is the core. From here, the natural extensions are:

  • Interview question generation — Given the score and skill gaps, Claude can generate role-specific technical questions that probe identified weaknesses (POST /api/v1/generate-questions, already live in Stackwright).
  • Batch processing — Accept an array of resumes for a single job, return ranked candidates with comparative scoring.
  • Feedback loops — Record hiring outcomes and periodically re-tune your evaluation prompt against what actually worked.
  • Structured data extraction — Extract years of experience, tech stack, and company tenure into structured fields for ATS population.

The architecture is intentionally minimal so you can extend it. One Express route, one Anthropic call, one cache table. No ML infrastructure to maintain, no retraining cycles, no GPU bills.

Ready to integrate?

Start scoring resumes in minutes. Free tier ships immediately — no credit card. Pro starts at $49/mo for production scale.

Try the API Free → See Pricing →