AI-Powered HR Analytics System
Achievement Log
Overview
Conversational analytics platform enabling Turkish-speaking HR teams to query MongoDB HR data in natural language. Built on Amazon Bedrock with Claude 3.5 Sonnet, the dual-engine system routes simple queries through a deterministic local engine and complex ones through LLM-generated MongoDB aggregation pipelines. Achieved 85% first-attempt success rate and reduced report generation from 2–3 days to under 8 seconds.
Core Technologies
Implementation & Architecture
Dual-Engine Query Routing Architecture
Simple Engine with 47 Turkish regex patterns (12 intent categories) runs first with slot-filling for dates and department names. On miss, Complex Engine uses asyncio.gather to parallelize schema fetching and prompt assembly before Bedrock invocation. PipelineValidator whitelists operators, validates field names against the live schema with fuzzy correction, and blocks JavaScript execution operators.
Turkish Language Intent Processing
Normalization pipeline: Turkish Unicode preservation, number word resolver ('iki yüz' → 200), locative case suffix stripper ('İstanbul'da' → 'İstanbul'), and colloquial abbreviation expander. 8 Turkish few-shot examples in the Bedrock system prompt. Rapidfuzz fuzzy matching for department name normalization against live MongoDB slugs.
CI/CD Pipeline & Container Registry
GitHub Actions with OIDC role assumption (no static IAM keys). Multi-stage Docker build (280 MB final image, 62% smaller than single-stage). ECR with git SHA tagging and 10-image lifecycle policy. App Runner zero-downtime rolling deployment on every main branch push.
Technical Skills
- Amazon Bedrock
- LLM Integration
- MongoDB
- FastAPI
- Docker
- Python
- Tableau
- Bash Scripting
Engineering Challenges
- →Query Hallucination on Undefined Fields — LLM generated plausible but non-existent field names, causing 19% first-attempt failures. Fixed by injecting the full live MongoDB schema into every prompt and adding difflib fuzzy-match auto-correction.
- →MongoDB Aggregation Timeouts — $lookup stages ran before $match, scanning full collections. Fixed by PipelineValidator always moving $match to position 0, plus compound indexes on the most-joined field combinations.
- →Token Cost Overrun in Month 1 — PromptBuilder injected all 7 collection schemas per query (+2,100 tokens) with unbounded conversation history. Fixed by injecting only relevant collection schemas and capping history at 5 exchanges.
- →Turkish Morphology Zero-Result Queries — Inflected department names (e.g., 'pazarlamada') didn't match MongoDB slugs. Fixed with a 380-term HR morphology dictionary plus rapidfuzz matching (threshold 80).
- →App Runner Concurrent Request Contention — 8 simultaneous analysts caused p99 to spike to 6+ seconds due to request queuing at max_concurrency=10. Fixed by increasing max_concurrency to 50, allowing the async event loop to handle concurrent Bedrock calls.
- →PipelineValidator False Rejections — $redact and $replaceRoot were blocked, but some HR reports required them. Fixed by reviewing rejected queries with the HR team and adding operators to the whitelist with targeted safety constraints.
Project Outcomes
- ✓Report generation time dropped from 2–3 days to under 8 seconds for 85% of HR analytical questions.
- ✓85% first-attempt success rate on complex Turkish multi-stage aggregation queries; 96% by second attempt.
- ✓70% reduction in monthly Bedrock costs through Redis caching, schema-aware prompt truncation, and the Simple Engine.
- ✓MongoDB p99 aggregation latency reduced 87% (4.2s → 320ms) via compound indexes and stage reordering.
- ✓8 concurrent analyst sessions handled without degradation after App Runner concurrency optimization.
- ✓Fully containerized CI/CD with OIDC-based GitHub Actions — zero static credentials, zero-downtime rolling deployments.