4Bell Technology

Staffing & Recruiting

Senior QA Engineer(R-1444)

1,200,000.00-1,500,000.00/A

Any Degree

IT (Information Technology)

Contract

Bangalore/Bengaluru

10-Jun-2026

Manual testing Automation testing Python Api testing Aiml LLM Regression testing RedTeaming CI/CD experience GitHub Actions/GitLab CI

Job Description

Key Responsibilities Evaluation Infrastructure (Probabilistic Surface)
– Design and own the end-to-end evaluation framework for AI-native product features, including model outputs, retrieval-augmented generation pipelines, and clinical decision-support components.
– Build and operate LLM-as-judge pipelines: define judge prompts, measure inter-rater reliability, track judge drift over time, and establish confidence intervals for evaluation scores.
– Implement production sampling strategies to surface real-world failure modes not captured by offline evals.
– Red-team AI features systematically: design adversarial inputs, edge-case taxonomies, and regression suites for model regressions across releases.
– Instrument and maintain eval dashboards; define and track quality KPIs that distinguish signal from noise in probabilistic outputs.
– Apply appropriate statistical methods — distribution analysis, significance testing, regression — to characterise model behaviour and communicate results to engineering and product leadership

Classical Quality Engineering (Deterministic Surface)
– Architect and own test strategy for APIs, authentication flows, billing, integrations, and platform performance.
– Build and maintain automation suites in Python; integrate with CI/CD pipelines (GitHub Actions, GitLab CI) to gate releases on deterministic quality criteria.
– Define and govern entry/exit criteria, quality gates, and release readiness assessments across the software development life cycle.
– Conduct API testing (Postman, pytest, or equivalent), regression, smoke, and performance testing as required.
– Log, triage, and drive root-cause analysis for production defects; implement preventive measures


Required Qualifications & Skills 

– Demonstrated experience building or significantly expanding a quality engineering or evaluation function — not inheriting one.
– Strong, production-grade Python: real code, not scripts. Able to build eval harnesses, data pipelines, and automation frameworks without scaffolding.
– Direct experience with eval systems for AI or ML products: offline evals, online sampling, LLM-asjudge, or equivalent.
– Applied statistics literacy: comfortable with distributions, confidence intervals, significance testing, and what they mean in the context of model evaluation.
– Genuine point of view on what to measure in an AI product — and the ability to defend that view under scrutiny.
– API testing proficiency: REST, authentication, contract testing, error handling.
– Version control and CI/CD experience: Git, GitHub Actions, or GitLab CI.