🔒 Rules are LOCKED — every team is measured identically, exactly as the document demands ('it shouldn't be configurable at all'). The definitions and current values stay fully visible below.
How this page works. Edit a field — the formula, the diagram and the live preview update IMMEDIATELY with the draft (the preview is recomputed by the real engine on the real data, nothing is saved yet). Press Save to apply the settings everywhere — there is exactly one configuration shared with the Overview. Defaults are exactly what the metrics document defines; deviations from a document value are flagged deviates from the standard document. Source: “[WIP]: Core Engineering metrics” in Confluence ↗
Standardization rules — Definition of Done & Epics (applies to every metric)
Jira and ADO will be the main systems we need to rely on in order to track the metrics. Before that is described, we need to define a certain set of rules for how we approach work classification, specifically on Jira.
Definition of Done — Several metrics depend on the way we interpret work to be done. Each team should have a clear DoD (Definition of Done) and that should translate naturally to Jira. Usually a good DoD should be "live and enabled", meaning the development reached the production environment and is enabled (feature toggle or AB test is switched on). Typical states on Jira that translate to work done: Done; Resolved; Closed.
⚠ "Release on Stage" shouldn't be treated as work done. "Release on Live" shouldn't be treated as work done if the flow itself already includes the states defined above.
Epics usage on Jira — Epics are usually used to track teams delivery, however some Product Owners are treating Epics as initiative aggregators to join all work done pre and post initiative delivery. This means that an Epic can be in progress while all work items below that Epic are done. This kind of approach can destroy how an engineering team tracks delivery oriented metrics. For such reasons, the tracking tool should have an option to exclude Epics from its metric compute logic (per project).
EVO workflow — real statuses & how the current rules treat them
Every team has its own status names; the document's standard names are only the defaults. Legend: green = time counts, grey = waiting (passive list), light = intake/new, ✓ = done.
Delivery & Velocity
“The Lead Time should measure the time between when the ticket was created until it is done. The purpose is to measure the full end-to-end customer experience.”
Full text from the document
The Lead Time should measure the time between when the ticket was created until it is done. The purpose is to measure the full end-to-end customer experience.
Rules:
• Done aligned with Team's DoD
• Waiting time should not be excluded. Waiting time can be:
– Non working days (Weekends, bank holidays and vacations)
– On Hold/Blocked states
• Present time in days
Open the source page in Confluence ↗
Formula — follows the fields below
lead = done_at − created_at // calendar days — weekends & holidays counted
done_at = moment of ENTERING a done status [Done, Resolved, Closed] (last such entry in window)
// time spent INSIDE the done status is NOT counted
cohort − issue types [Epic]
done_at = moment of ENTERING a done status [Done, Resolved, Closed] (last such entry in window)
// time spent INSIDE the done status is NOT counted
cohort − issue types [Epic]
How it is measured — updates as you type
Live preview — real data, saved settings
no data in this window
Result with these settings: median — days over 0 tickets
Settings used by this metric — the diagram, formula and preview above follow them live; Save applies them everywhere
Done statuses (DoD) (document default)Done, Resolved, ClosedPDF p2: Done, Resolved, Closed. “Release on Stage/Live” must not be here unless the flow truly ends there.
Excluded issue types (document default)EpicPDF p2: option to exclude Epics (per project).
Count weekends & non-working days (document default)onDocument standard: ON (“waiting time should not be excluded”). OFF = flagged deviation.
“The Cycle Time should measure the time from when the ticket is put in progress until it is done. The purpose is to measure the actual work that was done.”
Full text from the document
The Cycle Time should measure the time from when the ticket is put in progress until it is done. The purpose is to measure the actual work that was done. Active work is usually the term used for cycle time; however, the word active gets easily misinterpreted. Cycle time is supposed to include waiting time that many times appears after it started. Effective hours/days spent on concrete engineering work without any waiting time is typically something else that's not traditional cycle time (please check Active Work Time).
Rules:
• Same rules as Lead Time
• In Jira, active work usually starts when setting a ticket to the next states: In Progress / Any other?
• If the ticket goes back to New, the time it remains in New should not be included when the ticket goes past In Progress again.
Open the source page in Confluence ↗
Formula — follows the fields below
cycle = done_at − first(status of Jira category “In Progress”) // done_at = ENTRY into a done status; time inside it not counted − Σ(time in “new”-category statuses after start)
// calendar days — weekends & holidays counted
// calendar days — weekends & holidays counted
How it is measured — updates as you type
Live preview — real data, saved settings
no data in this window
Result with these settings: median — days
Settings used by this metric — the diagram, formula and preview above follow them live; Save applies them everywhere
Cycle start statuses (document default)Empty = canonical: any status of the Jira “In Progress” category. A list overrides (PDF: “In Progress / Any other?”).
Subtract time after a bounce back to New (document default)onDocument standard: ON.
“Active Work Time tracks only when the team is actively working, removing all waiting time from the cycle time. It accurately reflects the work required by the Engineering teams.”
Full text from the document
Active Work Time tracks only when the team is actively working, removing all waiting time from the cycle time. It accurately reflects the work required by the Engineering teams. Besides the waiting time already covered on the Cycle Time and Lead Time there are 2 other layers of waiting time that happen every day.
State transitions — All Jira statuses and how they are meant to be used either represent active time, passive time (wait time), or a mix of both. Examples:
• In Progress — mostly active
• Code Review — a mix of active and passive. Most of the time, moving a ticket for review means waiting on someone else to review a PR. The active work on that state is the actual time each engineer spent reviewing the PR.
• On Hold/Blocked — passive time as the team can't proceed
• Ready To Test — passive time; there is no active time here
• In Testing — similar to In Progress and is mostly active time
• Ready to merge — mostly passive; normally a residual active time
• Release on Stage / Release on Live — mix of active and passive
Working hours — The Engineering teams don't work 24/7 and per definition this metric already excludes non-working days and should also exclude non-working hours on working days, typically 8 or 9 hours of work. We are currently not tracking the time spent on each ticket in each working day. It's possible that someone is actually working on multiple tickets or even involved in meetings and other activities that aren't trackable or not worth it. So this is an attempt of standardizing 8/9 hours of work per day.
Rules:
• Done following the same rules as Lead Time and Cycle Time
• All waiting time defined in Lead Time and Cycle Time should be excluded
– Vacations can potentially benefit from an integration with Bamboo HR
– On Hold/Block assume such states need to be mapped
• Exclude the following states defined in the State Transitions section: On Hold/Blocked; Failed on QA; Ready To Test; Ready To Merge; Any other?
• Implement a logic to comply with the Working Hours section so that non-working hours are excluded. This doesn't need to be perfect, but at least on cycle times of multiple days, it will allow removing most of the time when people are not actually working.
• Present time in days
Open the source page in Confluence ↗
Formula — follows the fields below
active_hours = Σ_workdays min(hours in non-passive statuses, 8h)
shown = active_hours ÷ 24 // unit: calendar days — same scale as Lead/Cycle, active ≤ cycle holds
passive = [On Hold, Blocked, Failed on QA, Ready To Test, Ready To Merge]
workday = weekday ∧ ¬assignee-vacation (BambooHR) ∧ ¬country-holiday
shown = active_hours ÷ 24 // unit: calendar days — same scale as Lead/Cycle, active ≤ cycle holds
passive = [On Hold, Blocked, Failed on QA, Ready To Test, Ready To Merge]
workday = weekday ∧ ¬assignee-vacation (BambooHR) ∧ ¬country-holiday
How it is measured — updates as you type
Live preview — real data, saved settings
no data in this window
Result with these settings: not computable in this window
- • EVO's real workflow statuses under the current settings — time COUNTS in: —
- • excluded as WAITING (passive list): —
- • excluded as intake/bounce ("new"-category): —
- • BambooHR live data feeding the vacation/holiday rule: 0 employees (0 linked to Jira by work email) · 0 approved vacation requests · 0 public-holiday days ()
Settings used by this metric — the diagram, formula and preview above follow them live; Save applies them everywhere
Passive (waiting) statuses (document default)On Hold, Blocked, Failed on QA, Ready To Test, Ready To MergeThe document’s exclude list + your team’s “Any other?” additions.
Working hours per day (document default)8PDF p4: “typically 8 or 9 hours”.
“The Time to First PR measures the time between when a ticket was put in progress until an associated PR is opened.”
Full text from the document
The Time to First PR measures the time between when a ticket was put in progress until an associated PR is opened. The intention is to measure how quickly engineers can turn a ticket into a reviewable implementation.
Rules:
• Same rules as Cycle Time
Open the source page in Confluence ↗
Formula — follows the fields below
ttfp = first(associated PR.created_at ≥ first(status of Jira category “In Progress”)) − first(status of Jira category “In Progress”) − Σ(time back in “new”-category statuses) // calendar days — weekends & holidays counted; same rules as Cycle Time
associated ⇔ PR title or source branch mentions the issue key
// only-earlier-PR tickets = anomaly bucket (excluded, disclosed) — spec M5 / T007
associated ⇔ PR title or source branch mentions the issue key
// only-earlier-PR tickets = anomaly bucket (excluded, disclosed) — spec M5 / T007
How it is measured — updates as you type
Live preview — real data, saved settings
no data in this window
Result with these settings: no associated PRs in this window
“Lead Time for Changes measures the time it takes for a committed code change to successfully run in production.”
Full text from the document
Lead Time for Changes is a core DevOps Research and Assessment metric that measures the time it takes for a committed code change to successfully run in production. The intent is to measure the actual responsiveness of our delivery system, not just the active engineering effort.
Essentially the Lead Time for Changes will behave like the Cycle time, the only difference is that it starts to count when an associated commit is pushed instead of the ticket being set to In Progress.
Rules:
• Same rules as Cycle Time
(From the meeting: "the definition was not… the hours from pull request merge to deployment to production… it's the time of the first commit until that commit lands in production. It's different from the merge.")
Open the source page in Confluence ↗
Formula — follows the fields below
ltc = first(prod deploy after merge, result ∈ [succeeded]) − first_commit
prod ⇔ env name matches [production, prod, prd, live, stable] (lower tokens win first) // calendar days — weekends & holidays counted
prod ⇔ env name matches [production, prod, prd, live, stable] (lower tokens win first) // calendar days — weekends & holidays counted
How it is measured — updates as you type
Live preview — real data, saved settings
no data in this window
Result with these settings: unavailable (needs first commits + recognizable prod deploys)
Quality & Reliability
Prod / Pre-Prod Bug Rate
MUST HAVEProd Bug Rate (MUST HAVE) + Pre-Prod Bug Rate (NICE TO HAVE), p.6 ↗“The Prod Bug Rate measures the number of bugs that were reported in the production environment, broken down per severity; Pre-Prod is the same for non-productive environments.”
Full text from the document
Prod Bug Rate — The Prod Bug Rate measures the number of bugs that were reported in the production environment. On top of it, it should also break down this number per severity. These bugs can be created by the Incident Management process or any team member. The important part is that it should account for production bugs.
This metric should also offer a way to look at the number of bugs in a rate fashion, like the average number of bugs per week.
Pre-Prod Bug Rate — This is essentially the same as the Prod Bug Rate but for all bugs found on non-productive environments.
Open the source page in Confluence ↗
Formula — follows the fields below
bugs = count(type ∈ [Bug, Incident], created in window); rate = bugs ÷ weeks
severity = first present of [customfield_10050, customfield_10511, customfield_10093]
env: prod labels [production, prod] · pre-prod [pre-prod, preprod, pre-production, staging, qa] · none → unknown
severity = first present of [customfield_10050, customfield_10511, customfield_10093]
env: prod labels [production, prod] · pre-prod [pre-prod, preprod, pre-production, staging, qa] · none → unknown
How it is measured — updates as you type
Live preview — real data, saved settings
no data in this window
Result with these settings: 0 bugs · 0/week · environment unknown (no env labels)
Settings used by this metric — the diagram, formula and preview above follow them live; Save applies them everywhere
Bug issue types (document default)Bug, IncidentPDF: “Status Bug, Incident” — also the unplanned-work types.
Severity fields (document default)customfield_10050, customfield_10511, customfield_10093Jira custom field ids, first match wins.
Production env labels (document default)production, prod
Pre-prod env labels (document default)pre-prod, preprod, pre-production, staging, qa
“CFR measures the percentage of software deployments to production that result in a degraded service and require immediate remediation.”
Full text from the document
Change Failure Rate (CFR) is a core DORA metric that measures the percentage of software deployments to production that result in a degraded service, impairment, or outage and subsequently require immediate remediation. It evaluates the quality and stability of your delivery pipeline, ensuring that speed (measured by Lead Time) does not come at the expense of stability.
A deployment is a failure if it requires:
• An emergency rollback to a previous version.
• A hotfix or urgent "fix-forward" patch shipped outside the normal cadence, for example, a major incident (P0 or P1) triggered by the Incident Management process but only if the cause was effectively from the code that was shipped.
Rules:
• Change Failure Rate is a percentage metric calculated as the (number of failed deployments / total number of production deployments) x 100
• Failed deployments obtained as mentioned before
Open the source page in Confluence ↗
Formula — follows the fields below
cfr = failed ÷ eligible_production_deploys × 100 // eligible = result ∈ success ∪ failure; canceled/notDeployed excluded
failed ⇔ result ∈ [failed] ∨ manual flag // bridge until Incident Management
failed ⇔ result ∈ [failed] ∨ manual flag // bridge until Incident Management
How it is measured — updates as you type
Live preview — real data, saved settings
no data in this window
Result with these settings: no recognizable production deployments
Settings used by this metric — the diagram, formula and preview above follow them live; Save applies them everywhere
Production environment tokens (document default)production, prod, prd, live, stable
Lower-tier tokens (matched first) (document default)pre-prod, preprod, pre-production, staging, stage, canary, alpha, beta, qa, test, dev, sandbox, uat, feature
Pipeline results counted as failure (document default)failedADO vocabulary: failed, partiallySucceeded, canceled…
Pipeline results counted as success (document default)succeeded
Flow & Predictability
“The Throughput should measure the number of completed tickets within a timeframe as well as the average number of completed tickets per week.”
Full text from the document
The Throughput should measure the number of completed tickets within a timeframe as well as the average number of completed tickets per week. Usually the latter is more useful to explain what's the usual flow of the team on a week basis.
Rules:
• If Epics are excluded from delivery, they should not count for the throughput.
(From the meeting: "you split this by agents and humans, which is pretty cool"; "having an average of how many tickets per week… we think more of weekly basis".)
Open the source page in Confluence ↗
Formula — follows the fields below
throughput = count(done in window) ; per_week = count ÷ weeks
split: assignee ∈ AI agent roster → agent, else human
split: assignee ∈ AI agent roster → agent, else human
How it is measured — updates as you type
Live preview — real data, saved settings
no data in this window
Result with these settings: 0 tickets · 0/week — agents 0 / humans 0
“The Deployment Frequency measures how often a team/project successfully releases code to the production environment — total plus Daily / Weekly / Monthly rates.”
Full text from the document
The Deployment Frequency measures how often a team/project successfully releases code to the production environment. It serves as a primary indicator of the team's overall throughput, agility and batch size. Like Throughput it should show the total number of successful production deployments for a given timeframe as well as a rate basis. There should be 3 types of rate:
• Daily
• Weekly
• Monthly
(From the meeting: "even if it said 50 deployments in those 30 days, it doesn't tell me what usually I think about, which is number of deploys per week or per day… per day, per week, per month — it's really interesting to have those numbers right there.")
Open the source page in Confluence ↗
Formula — follows the fields below
df = count(prod deploys, result ∈ [succeeded])
rates: ÷ days · ÷ weeks · ÷ (30-day months)
rates: ÷ days · ÷ weeks · ÷ (30-day months)
How it is measured — updates as you type
Live preview — real data, saved settings
no data in this window
Result with these settings: no recognizable production deployments
Settings used by this metric — the diagram, formula and preview above follow them live; Save applies them everywhere
Days per month (Monthly rate) (document default)30The divisor for the Monthly rate (document asks for Daily / Weekly / Monthly).
“Measures how the flow of a team can be impacted by unexpected work; the classification depends on whether the team reserves maintenance capacity.”
Full text from the document
The Planned vs Unplanned Work measures how the flow of a team can be impacted by unexpected work. The distinction between planned and unplanned work separates value-generating progress from unpredictable operational friction. Unplanned work is any unexpected task, disruption, or emergency that breaks the current iteration or sprint focus. It represents the operational tax paid for system instability, technical debt, or shifting priorities.
Maintenance vs Unplanned Work — Usually, teams should reserve capacity for maintenance work (per sprint, per week), which means that if there is unplanned work, like an urgent issue to fix, this should be treated as planned work if the effort falls under the reserved capacity. If more capacity is needed for the same or other tasks, then it becomes Unplanned work.
Rules:
• If the team does not make any capacity planning for maintenance, any kind of deviation, anything that wasn't planned, should be interpreted as unplanned work (bugs, support requests, stakeholders requests, …). For this scenario, capture unplanned work by: Status Bug, Incident; Label: unplanned_work
• If the team reserves capacity for maintenance, then unplanned work should only be accountable after the reserved capacity has been maxed out. For this scenario, capture unplanned work by: Label: unplanned_work
Open the source page in Confluence ↗
Formula — follows the fields below
unplanned ⇔ type ∈ [Bug, Incident] ∨ label ∈ [unplanned_work]
How it is measured — updates as you type
Live preview — real data, saved settings
no data in this window
Result with these settings: 0 planned / 0 unplanned = 0% (no-reserved-capacity)
Settings used by this metric — the diagram, formula and preview above follow them live; Save applies them everywhere
Team reserves maintenance capacity (document default)offOFF: Bug/Incident OR label = unplanned. ON: only the label counts (after the reserve is exceeded).
Unplanned issue types (document default)Bug, Incident
Unplanned labels (document default)unplanned_workPDF p8: unplanned_work; the meeting: “if you configure that label”.
PR Throughput + PR Acceptance Rate
NICE TO HAVEAI DRIVENPR Throughput + PR Acceptance Rate (NICE TO HAVE, AI DRIVEN), p.8 ↗“PR Throughput = merged PRs within a timeframe plus the average merged per week; PR Acceptance Rate = merged vs abandoned.”
Full text from the document
PR Throughput — The PR Throughput should measure the number of merged PRs within a timeframe as well as the average number of merged PR per week.
PR Acceptance Rate — The PR Acceptance rate should measure the percentages of merged PRs vs the ones that were abandoned.
Open the source page in Confluence ↗
Formula — follows the fields below
pr_throughput = count(merged in window) ; per_week = merged ÷ weeks
acceptance = merged ÷ (merged + abandoned) × 100
acceptance = merged ÷ (merged + abandoned) × 100
How it is measured — updates as you type
Live preview — real data, saved settings
no data in this window
Result with these settings: 0 merged · 0/week · 0 abandoned → acceptance —%
“The AI Participation Rate should measure the percentages of tickets done that had an AI Agent Assigned.”
Full text from the document
The AI Participation Rate rate should measure the percentages of tickets done that had an AI Agent Assigned.
Open the source page in Confluence ↗
Formula — follows the fields below
participation = count(done ∧ assignee ∈ AI agents) ÷ count(done) × 100
How it is measured — updates as you type
Live preview — real data, saved settings
no data in this window
Result with these settings: 0 of 0 done tickets = 0%
Source of every rule on this page: “[WIP]: Core Engineering metrics” — Confluence ↗ + the metrics meeting of 2026-06-08. Quotes are verbatim.