Leidos
Norfolk, VA; Jacksonville, FL; Bremerton, WA; San Diego, CA

SRE Metrics Analyst Intern

Hybrid$48,100 – $86,950/yrDoD security clearancePosted Dec 26, 2025WebsiteLinkedIn

Skip the busywork

ApplyBolt rewrites your resume for this exact role and hits submit. You just pick the jobs.

Resume tailored to this roleApplied in secondsTrack every application
Download the app

About this role

Key Responsibilities:

Metrics Collection Framework:

  • Design and implement a comprehensive metrics collection framework that captures key performance indicators (KPIs) related to system reliability and operational efficiency.
  • Identify relevant metrics and establish methods for collecting, aggregating, and storing data from various sources, including monitoring tools, logs, and databases.

Data Analysis and Visualization:

  • Analyze collected metrics to identify trends, patterns, and anomalies that impact system reliability and performance.
  • Develop dashboards and visualizations to present data in a clear and actionable manner using tools such as Grafana, Kibana, or Tableau.
  • Ensure that stakeholders have access to real-time insights and reports that inform decision-making.

Reporting:

  • Create regular reports on system performance, reliability, incident response times, and other critical metrics for various stakeholders, including technical teams and management.
  • Provide insights and recommendations based on data analysis to drive continuous improvement initiatives.
  • Prepare and present findings to stakeholders, facilitating discussions on reliability goals and performance enhancements.

Collaboration with SRE Teams:

  • Work closely with SRE teams to identify their metric needs and ensure alignment with operational goals.
  • Collaborate with engineering and operations teams to ensure that metric collection is integrated into development and deployment processes.
  • Support incident response efforts by providing metrics that help identify root causes and areas for improvement.

Continuous Improvement:

  • Stay current with industry trends and best practices related to metrics collection, monitoring, and reporting within SRE and DevOps.
  • Continuously evaluate and enhance the metrics collection and reporting processes to improve data accuracy, relevance, and accessibility.
  • Foster a culture of data-driven decision-making within the SRE team and broader organization.

Key Qualifications:

  • Enrolled in a degree program in a related major - GPA 3.0 or better
  • US citizenship required
  • Ability to obtain and maintain a DoD security clearance

Experience:

  • Experience in metrics collection, data analysis, or reporting, preferably in a Site Reliability Engineering or DevOps environment.
  • Proven experience in working with monitoring and observability tools (e.g., Prometheus, Datadog, New Relic).

Technical Skills:

  • Strong understanding of key metrics used in site reliability engineering, including SLIs, SLOs, and SLAs.
  • Proficiency in data analysis tools and languages (e.g., SQL, Python, R) for data manipulation and reporting.
  • Experience with data visualization tools (e.g., Grafana, Kibana, Tableau) to create dashboards and reports.

Analytical Skills:

  • Strong analytical and problem-solving skills, with the ability to interpret complex data sets and provide actionable insights.
  • Ability to evaluate the relevance and accuracy of metrics and make recommendations for improvement.

Communication and Collaboration:

  • Excellent communication skills, both written and verbal, with the ability to present data and findings to technical and non-technical audiences.
  • Proven ability to work collaboratively with cross-functional teams and build strong relationships with stakeholders.

Preferred Qualifications:

  • Experience with cloud platforms (AWS, GCP, Azure) and their monitoring tools.
  • Familiarity with incident management processes and practices within an SRE context.
  • Knowledge of software development methodologies and best practices.

Key Metrics of Success:

  • Timely and accurate collection of key performance metrics with minimal data discrepancies.
  • Effective visualization and reporting of metrics that inform decision-making and drive improvements in reliability.
  • Positive feedback from stakeholders regarding the clarity and usefulness of reports and insights.
  • Continuous improvement in the SRE metrics collection and reporting processes, leading to better operational performance.

Why Join Us?

Be part of a dynamic and innovative team focused on enhancing the reliability and performance of critical systems. Play a key role in shaping the metrics strategy that drives operational excellence and continuous improvement. Work in an environment that values collaboration, professional development, and a commitment to quality. Contribute to the success of the organization by providing actionable insights that improve system reliability and performance.

Summary:

The SRE Metrics Analyst Intern is crucial for ensuring that the Site Reliability Engineering team has the data and insights needed to maintain and improve system reliability. This role requires a blend of technical expertise, analytical skills, and effective communication to drive data-driven decision-making and enhance operational performance. The ideal candidate will have a strong background in metrics collection, data analysis, and reporting, along with a passion for supporting the organization’s reliability goals.