You measure performance variability instead of just averages by looking at standard deviation and percentile distribution rather than a single mean score — standard deviation tells you how far an agent’s individual calls typically spread out from their average, and percentile data shows you what a customer is actually likely to experience on any given call, including the calls that fall well outside that average.
Why the average alone is the wrong number
An average collapses every call an agent handled in a week into one number, and in doing so it erases the exact information a manager actually needs: how often, and how far, that agent’s performance strayed from their typical standard. Two agents can post the identical average QA score while one of them is remarkably consistent call after call and the other swings wildly between excellent and poor — the average makes them look identical when their actual reliability is nothing alike.
Standard deviation is the statistical tool built specifically to answer this. It measures how much individual values typically differ from the average — a small standard deviation means an agent’s calls cluster tightly around their typical performance, while a large standard deviation means their calls are scattered widely above and below it, even if the average itself looks fine.
What this looks like applied to a call center floor
Picture two agents who both average an 85 on QA scorecards over a month. Agent A’s individual call scores cluster tightly between 80 and 90 almost every time — a low standard deviation. Agent B swings between scores of 60 and 100 depending on the call — a high standard deviation, even though the average lands at the exact same 85. Agent A is the more reliable, predictable performer. A QA dashboard built only on averages would never reveal that difference, because it would show both agents as identical.
Why percentiles matter alongside standard deviation when you measure performance variability
Standard deviation tells you how much variation exists. Percentile data tells you what that variation actually looks like in practice — for instance, what score the worst 10% of an agent’s calls land at, not just how far that 10% deviates mathematically from the mean. This is the same logic performance analysts in other technical fields use when they look past an average response time and check the slower percentile of results, because the average alone hides exactly how bad the worst-case experience gets for the people who land in it.
Applied to a call center, the question worth asking isn’t just “what’s this agent’s average call quality” — it’s “what does a customer landing in this agent’s worst 10% of calls actually experience,” because that’s the experience driving complaints, escalations, and churn, regardless of how solid the average looks on a monthly report. This same blind spot is what we describe on our page about what causes inconsistent agent performance day to day.
What this means for how you track performance
If variability itself is the thing damaging both performance and customer experience — not just low averages — then a measurement system built only around averages is structurally blind to the actual problem. Tracking standard deviation alongside the mean, and reviewing percentile distributions rather than single scores, gives leadership visibility into something a plain average can never show: which agents are consistently good, which are consistently struggling, and which are unpredictable in a way that no average will ever reveal, the same unpredictability described in our page on emotional contagion among call center agents. That distinction is foundational to how ORS™ approaches measurement, building on the same accumulation described in our page on operational dysregulation load.