Why AI Resume Scoring Matters
A typical engineering role at a mid-sized company generates 300–800 applicants. A senior recruiter, spending 30 seconds per resume, takes 2–6 hours just to produce a shortlist. That's before any actual evaluation happens.
The bigger problem isn't speed — it's consistency. Human reviewers are influenced by recency bias, formatting preferences, and credential anchoring (company names as proxies for ability). A Stanford grad with 2 years of experience gets through; a self-taught engineer with 7 years of open-source contributions doesn't. The signal-to-noise ratio is terrible.
Traditional ATS keyword matching doesn't fix this. It filters for resume optimization skills, not engineering skills. You end up rejecting candidates who know how to do the job but don't know to write "TypeScript" instead of "TS".
What you actually want is a system that reads resumes the way a senior engineer would: understanding that "built a distributed cache with 99.9% uptime" tells you more than "proficient in Redis", and that breadth of contribution in open source matters as much as employer names.
Claude is well-suited for this. It reasons about context, understands technical depth versus surface familiarity, and can follow structured evaluation criteria. The trick is building an API around it that's reliable, fast, and integrated with your existing hiring stack.
Architecture Overview
The stack is intentionally minimal: Express.js for the API layer, Claude (via Anthropic's API) for reasoning, and PostgreSQL for caching results and tracking usage. No ML infrastructure, no model training, no GPU costs.
POST /api/v1/score-resume with resume text, job description, and required skills in JSON.api_keys table. Rate limit enforced per key (100 req/day on free tier).score_cache. Cache hit returns immediately without calling Claude.The key design decision is determinism via caching. The same resume against the same job description always returns the same score — no drift between calls. This matters when you're comparing candidates against each other or auditing hiring decisions.
The Scoring Endpoint
The Evaluation Prompt
The hardest part of building an AI resume scoring API isn't the Express route — it's the prompt. You need Claude to return consistent, structured JSON every time, with scores that are calibrated and defensible.
Here's the core evaluation prompt structure:
function buildScoringPrompt(resume, jobDescription, requiredSkills) {
return `You are a senior engineering hiring manager evaluating a candidate.
Score the following resume for this role. Return ONLY valid JSON, no prose.
## Job Description
${jobDescription}
## Required Skills
${requiredSkills.join(', ')}
## Resume
${resume}
## Scoring Instructions
Evaluate on these dimensions (0-100 each):
- technical_depth: Evidence of deep technical knowledge, not just familiarity
- relevant_experience: Direct experience with the required stack and domain
- problem_complexity: Scale and complexity of problems the candidate has solved
- leadership_signal: Code reviews, mentoring, architectural decisions (0 if N/A)
- growth_trajectory: Rate of skill development and responsibility increase
## Required Output Format
{
"fit_score": ,
"strengths": [<2-4 specific evidence-backed strengths>],
"gaps": [<1-3 specific gaps relative to the role, or empty array>],
"skill_matches": {: <"strong"|"partial"|"missing"> for each required skill},
"dimension_scores": {: <0-100>},
"hire_signal": <"strong_yes"|"yes"|"maybe"|"no">,
"summary":
}` ;
}Instructing Claude to return only valid JSON (with no prose) is critical. Claude models follow this reliably when you're explicit. We also use JSON.parse() with a try/catch and a retry for malformed responses — in practice, malformed output happens less than 0.5% of the time with claude-3-5-sonnet.
The Express Route
const crypto = require('crypto');
const Anthropic = require('@anthropic-ai/sdk');
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
app.post('/api/v1/score-resume', authenticateApiKey, async (req, res) => {
const { resume, job_description, required_skills = [] } = req.body;
// Input validation
if (!resume || !job_description) {
return res.status(400).json({
success: false,
error: 'MISSING_FIELDS',
message: 'resume and job_description are required'
});
}
// Cache key: hash of resume + job description content
const cacheKey = crypto
.createHash('sha256')
.update(resume + job_description + required_skills.join(','))
.digest('hex');
// Check cache first
const cached = await pool.query(
'SELECT result FROM score_cache WHERE cache_key = $1',
[cacheKey]
);
if (cached.rowCount > 0) {
return res.json({
success: true,
cached: true,
...cached.rows[0].result
});
}
const startMs = Date.now();
// Call Claude for evaluation
const message = await anthropic.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 1024,
messages: [{
role: 'user',
content: buildScoringPrompt(resume, job_description, required_skills)
}]
});
// Parse JSON from Claude's response
const rawText = message.content[0].text;
const evaluation = JSON.parse(rawText);
const result = {
...evaluation,
processing_time_ms: Date.now() - startMs
};
// Cache the result
await pool.query(
`INSERT INTO score_cache (cache_key, result, created_at)
VALUES ($1, $2, NOW())
ON CONFLICT (cache_key) DO NOTHING`,
[cacheKey, JSON.stringify(result)]
);
res.json({ success: true, cached: false, ...result });
});API Response Format
Here's what a real response looks like for a mid-level backend engineer applying for a Node.js role:
"success": true,
"fit_score": 78,
"hire_signal": "yes",
"summary": "Strong backend generalist with deep Node.js experience; gaps in distributed systems at scale.",
"strengths": [
"4 years of production Node.js with demonstrated performance optimization (50ms → 8ms API response)",
"Owned PostgreSQL schema design and migration strategy for 3 products",
"Open source contributor: 340 GitHub stars on a REST framework utility"
],
"gaps": [
"No evidence of Kubernetes or container orchestration at scale",
"Limited distributed systems design experience beyond single-node services"
],
"skill_matches": {
"Node.js": "strong",
"PostgreSQL": "strong",
"Kubernetes": "missing",
"Redis": "partial"
},
"dimension_scores": {
"technical_depth": 82,
"relevant_experience": 79,
"problem_complexity": 68,
"leadership_signal": 71,
"growth_trajectory": 84
},
"processing_time_ms": 2341,
"cached": false
}
Database Schema
Two tables support the scoring API: api_keys for authentication and rate limiting, and score_cache for deterministic result caching.
-- API keys with usage tracking
CREATE TABLE api_keys (
id SERIAL PRIMARY KEY,
key_hash VARCHAR(64) UNIQUE NOT NULL, -- SHA-256 of raw key
name VARCHAR(255), -- e.g. "Acme Corp Production"
email VARCHAR(255),
tier VARCHAR(20) DEFAULT 'free', -- free | pro
daily_limit INTEGER DEFAULT 100,
created_at TIMESTAMP DEFAULT NOW()
);
-- Cached scores for determinism
CREATE TABLE score_cache (
id SERIAL PRIMARY KEY,
cache_key VARCHAR(64) UNIQUE NOT NULL, -- SHA-256 hash
result JSONB NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
-- Rate limiting: count calls per key per day
CREATE TABLE api_usage_logs (
id SERIAL PRIMARY KEY,
api_key_id INTEGER REFERENCES api_keys(id),
endpoint VARCHAR(100),
status_code INTEGER,
response_time_ms INTEGER,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX idx_usage_key_day ON api_usage_logs
(api_key_id, date_trunc('day', created_at));Rate Limiting Without Redis
For a scoring API, you don't need Redis for rate limiting. A PostgreSQL query counting today's calls is fast enough — scoring calls are expensive (2–3 seconds each), so callers won't be hammering the endpoint at high enough frequency for a DB query to be a bottleneck.
async function checkRateLimit(apiKeyId, dailyLimit) {
const { rows } = await pool.query(`
SELECT COUNT(*)::int AS calls_today
FROM api_usage_logs
WHERE api_key_id = $1
AND created_at >= date_trunc('day', NOW())
`, [apiKeyId]);
const callsToday = rows[0].calls_today;
if (callsToday >= dailyLimit) {
throw { status: 429, code: 'RATE_LIMIT_EXCEEDED',
message: `Daily limit of ${dailyLimit} calls reached. Resets at midnight UTC.` };
}
return { callsToday, remaining: dailyLimit - callsToday };
}Calling the API
Here's a complete curl example you can run right now against the live API:
curl -X POST https://stackwright.polsia.app/api/v1/score-resume \
-H "Content-Type: application/json" \
-H "X-API-Key: sk-sw-demo-stackwright2025" \
-d '{
"resume": "John Smith\n5 years backend engineering at Stripe and Plaid. Node.js, PostgreSQL, Redis, AWS Lambda. Led migration of payment processing service from monolith to microservices. Open source: github.com/jsmith/pg-migrate-cli (2.1k stars).",
"job_description": "Senior backend engineer for fintech API team. Must have production Node.js experience, PostgreSQL, and payment processing domain knowledge.",
"required_skills": ["Node.js", "PostgreSQL", "Redis", "payment processing"]
}'The key sk-sw-demo-stackwright2025 works on the live API at a rate of 10 calls/day. No signup required for initial testing.
Production Considerations
Cost per API call
A typical score-resume call sends ~800 tokens (resume + job description + prompt) and receives ~400 tokens (the JSON evaluation). Using claude-3-5-sonnet-20241022:
- Input: 800 tokens × $3.00/MTok = ~$0.0024
- Output: 400 tokens × $15.00/MTok = ~$0.006
- Total: ~$0.008 per call (~1 cent)
With caching, repeat evaluations of the same resume against the same role are free. In a typical hiring workflow, you'll see 20–40% cache hit rates when multiple team members are reviewing the same candidates.
Handling malformed Claude output
Claude returns valid JSON >99.5% of the time when you're explicit. For the rare failure, implement a retry with a slightly different prompt and a fallback error response:
async function callClaudeWithRetry(prompt, maxRetries = 2) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const message = await anthropic.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 1024,
messages: [{ role: 'user', content: prompt }]
});
try {
return JSON.parse(message.content[0].text);
} catch (e) {
if (attempt === maxRetries - 1) throw new Error('Claude returned non-JSON response');
// Wait 500ms before retry
await new Promise(r => setTimeout(r, 500));
}
}
}ATS Integration
The most common integration pattern is a webhook receiver: your ATS posts a new-applicant event, your server calls /api/v1/score-resume, and posts the score back to the ATS candidate record. Ashby, Greenhouse, and Lever all support inbound webhooks for this pattern.
If you're using AI scoring in hiring decisions, document your evaluation criteria and provide candidates with a path to appeal. EEOC guidelines apply to algorithmic screening the same as human screening. A score is a signal, not a decision.
Try It Live
Stackwright is the production implementation of everything described in this article. The API is live, the documentation has a browser-based test console, and the demo key below gives you 10 calls to try it right now.
Try the live API
Paste a real resume, a job description, and a list of required skills. Get a structured score back in under 3 seconds.
What's Next
The scoring endpoint is the core. From here, the natural extensions are:
- Interview question generation — Given the score and skill gaps, Claude can generate role-specific technical questions that probe identified weaknesses (
POST /api/v1/generate-questions, already live in Stackwright). - Batch processing — Accept an array of resumes for a single job, return ranked candidates with comparative scoring.
- Feedback loops — Record hiring outcomes and periodically re-tune your evaluation prompt against what actually worked.
- Structured data extraction — Extract years of experience, tech stack, and company tenure into structured fields for ATS population.
The architecture is intentionally minimal so you can extend it. One Express route, one Anthropic call, one cache table. No ML infrastructure to maintain, no retraining cycles, no GPU bills.
Ready to integrate?
Start scoring resumes in minutes. Free tier ships immediately — no credit card. Pro starts at $49/mo for production scale.