Senior ML Observability Engineer
Job details
- Location
- Falls Church or Fairfax or Arlington
- Work type
- Onsite
- Clearance
- Top Secret/SCI
- Posted
- 2 days ago
- Apply on
- myjobs.adp.com
About this role
Everforth ECS is seeking a Senior ML Observability Engineer to work in the National Capital Region covering the Pentagon, Falls Church, and Fairfax. Please Note: This position is contingent upon contract award.
The War Data Platform (WDP) is a key initiative within the U.S. Department of War's (DoW) AI-First strategy introduced in early 2026. The WDP focuses on operational warfighting data and aims to accelerate the deployment of artificial intelligence (AI) on the battlefield. The WDP extends to Unclassified, Secret, and Top Secret environments, and supports collaboration between Combatant Commands, Joint Staff directorates, Senior Executive Service leaders, and operational analysts.
The Senior ML Observability Engineer architects and governs the instrumentation and telemetry infrastructure needed to ensure production AI and machine learning models deployed across WDP's multi-enclave environment perform reliably and securely at mission scale. This role is essential to maintaining real-time visibility into model behavior, pipeline execution, and cross-domain access interactions in direct support of Combatant Command and Joint Staff decision-making needs.
• Designs, implements, and governs observability and instrumentation architectures supporting AI and machine learning model-serving operations across Unclassified, Secret, and Top Secret enclaves within the War Data Platform (WDP) Core Integration enterprise.
• Develops semantic conventions, runtime instrumentation patterns, and telemetry pipelines that generate latency metrics, error signatures, throughput indicators, model-specific performance signals, and operational readiness measurements for deployed models and serving surfaces.
• Integrates observability capabilities into existing data pipelines, model-deployment workflows, API access patterns, and serving runtime frameworks to provide mission-relevant monitoring aligned with Combatant Command and Joint Staff decision-support needs.
• Configures and validates instrumentation using platforms such as OpenTelemetry, Prometheus, Grafana, Elastic, Splunk, Amazon CloudWatch, and service mesh telemetry components to deliver real-time visibility into model behavior, cross-domain access interactions, and pipeline execution characteristics.
• Conducts observability readiness reviews, supports test and evaluation gates, and collaborates with cybersecurity personnel to embed anomaly-detection signals aligned with Zero Trust and DoW cyber standards.
• Works with serving engineers, pipeline engineers, platform teams, and external provider integration engineers to maintain observability consistency across enclaves and resolve domain-specific telemetry constraints.
• Produces observability standards, instrumentation specifications, dashboards, alerting configurations, and performance analysis reports that strengthen reliability, accelerate incident response, and reinforce mission assurance for production model access across all security networks.
• Performs other duties as assigned.
About Everforth ECS
Skip the form. ApplyBolt does it in seconds.
The iPhone app tailors your resume for this role and submits the real application for you. Same process, same confirmation emails, just way less of your day.
- Resume rewritten for this exact role in seconds
- Submits the actual employer form, no shortcuts
- Real confirmation emails land in your inbox
