Software Engineer

Onsite$140,000 – $200,000/yrPosted 2 weeks agoWebsite LinkedIn

Manual Apply

Skip the busywork

ApplyBolt rewrites your resume for this exact role and hits submit. You just pick the jobs.

Resume tailored to this roleApplied in secondsTrack every application

Download the app

About this role

TLDR

You'll be hands-on in improving the real-world behavior of our AI systems — tracing and fixing runtime issues, building agent simulators, designing LLM evals and QA tools, and interfacing with client data. This is a role for builders who like prompt-level debugging, LLM system testing, and building infrastructure that improves our AI agents’ performance.

What You’ll Do

You’ll work across our AI agent platform — writing prompts, debugging runtime issues, building agent simulation tooling, creating evals, interfacing with client data, and helping us monitor system behavior at scale. This is not a model training role — it's an applied systems position focused on behavior, infrastructure, and debugging real-world agents in production. You will be working at the forefront of agentic AI, where you’ll be pushing the boundaries of our agents’ capabilities.

Some examples of what you might work on:

Trace and fix runtime bugs, then write regression tests.
Design evaluation datasets to simulate realistic workflows or red-team our system.
Build internal tooling for QA and agent simulation.
Normalize and transform messy client data for system integration.
Set up automatic testing and latency tracking infrastructure.
Create dashboards and observability tooling for agentic system behavior.
Expand on our existing eval & testing framework and agent simulation infrastructure.

Skills Required

Technical Skills

Proficiency in TypeScript
Strong generalist software engineer
Strong debugging skills. You can trace runtime failures, dig through logs, and pinpoint issues in async or multi-step agent systems.
Data transformation and ingestion. You can build pipelines to normalize and convert unstructured data for use in AI systems.
Strong understanding of system design, including distributed systems and reliability/performance tradeoffs
Experience using modern AI coding tools (e.g. Cursor, GitHub Copilot, Claude)
Excellent documentation and testing discipline
Proficiency with Git

Soft Skills

You care about improving agent behavior. This is an applied systems position focused on behavior, infrastructure, and debugging real-world agents in production. You will be working at the forefront of agentic AI, where you’ll be pushing the boundaries of our agents’ capabilities.
You’re high agency. AKA “agentic” ;) You can thrive with minimal structure. You are internally motivated. You proactively seek out ways to create value for your team.
You don’t mind getting in the weeds. Improving agent performance requires diving deep into the details: identifying and understanding real-world edge cases, editing prompts to address them, and writing evals to cover them in the future. Sound exciting? You’ll thrive. Sound tedious? You won’t.
You’re comfortable with ambiguity. You work well when specs are loose, or when the solution space spans prompts, code, and even a little RLHF.
You learn fast and move fast. You can pattern-match from past systems work and adapt to LLM-specific edge cases quickly.

Experience & Who Should Apply

We're looking for engineers with 2-7 years of experience who have worked closely with LLMs or AI agents in production systems. This is not a model R&D role — it’s about applying AI to real-world use cases: debugging behavior, designing evals, and building the infrastructure to scale intelligent systems.

You might be a strong fit if:

You've created internal tools or frameworks to support QA, evals, or agent simulation, and care about making complex systems observable and testable.
You’ve contributed to fast-paced product cycles involving AI behavior, latency, and user experience, and you’re comfortable validating behavior by inspecting outputs, not just logs.

Nice to have:

Experience with multi-agent systems, TTS/NLP pipelines, or structured output validation.
Familiarity with testing frameworks, LangChain-style agent orchestration, or in-house eval harnesses.
Experience with prompt engineering, LLM evals, and agent orchestration. You're comfortable writing and refining prompts, crafting evals, and r...