Moderation Quality · Weekly RCA Report
W14 −1.63pp
W13
W12
W11

Overall moderation accuracy dropped −1.63pp — here's why

Shift-share decomposition of W14 (Apr 4–10) vs W13 (Mar 28–Apr 3), isolating rate effect, weight effect, and their interaction. EMEA drives 128% of the decline; APAC offsets 31%.

Global W14
84.28%
▼ 1.63pp
from 85.91% · 100% of decline
EMEA · Wt 32.4%
79.65%
▼ 6.48pp
128% of global decline
APAC · Wt 47.8%
88.61%
▲ 0.92pp
−31% offset the decline
AMS · Wt 19.7%
81.37%
▼ 0.12pp
1% of global decline
Overview −1.63pp
Methodology, decomposition & fuzzy
Hub × Type 129%
EMEA Appeal alone = 50.9%
EMEA Markets 110%
MENA1 leads at 38.4%
GCP Split 105%
EMEA non-GCP exceeds total
Top Projects TOP 10
GB-MNL #1 at 25.1%
Actions 7
P0–P2 prioritized items
00

Methodology & summary

W14 (Apr 4–10) vs W13 (Mar 28–Apr 3). Each segment's total contribution is decomposed into three additive components. Positive % of Δ = contributed to the decline; negative = offset.

Rate effect = GWtW13 × (AccW14 − AccW13) — pure accuracy change at prior weight
Weight effect = (GWtW14 − GWtW13) × (AccW13 − Global AccW13) — mix shift relative to global mean
Interaction = (GWtW14 − GWtW13) × (AccW14 − AccW13) — joint change
−2.64pp
Total rate effect
161% of decline
+0.76pp
Total weight effect
Offset 47%
+0.24pp
Total interaction
Offset 15%
Quality degraded across the board — here's why this matters
The rate effect (−2.64pp) alone would have caused a 3.4pp decline if the mix hadn't shifted favorably. The actual −1.63pp is the best-case outcome given how much accuracy fell — saved only by favorable weight rebalancing.
APAC's growth was the safety net — here's how
APAC (88.6% accuracy, above global mean) grew from 47.2% → 47.8% of mix. This single shift absorbed nearly half the damage. Without it, the headline would read −3.1pp instead of −1.63pp.
How to read this decomposition

Interpreting the three effects

Rate effect (161%) tells us accuracy degradation within segments — holding mix constant — more than fully explains the decline. This is the "quality got worse" signal.

Weight effect (offset 47%) means the mix actually shifted favorably: segments with above-average accuracy gained share. Without this, the decline would have been ~3.4pp instead of 1.63pp.

Interaction (offset 15%) captures the joint effect — segments that lost accuracy also tended to shrink in weight, providing a small additional buffer.

The sum: −2.64 + 0.76 + 0.24 = −1.63pp, matching the observed global decline exactly.

01

Fuzzy rate impact

Fuzzy cases are counted as errors in OEA. Global fuzzy rate rose from 2.00% to 2.35% — this +0.35pp increase directly cost 0.36pp of accuracy, explaining ~22% of the total decline. The remaining 78% is genuine accuracy degradation.

−0.36pp
Fuzzy rate increase
21.8% of total decline
−1.28pp
Non-fuzzy accuracy decline
78.2% of total decline
Three hubs, three completely different stories — here's the punchline
AMS: decline is 100% fuzzy — real quality held steady. APAC: powered through the biggest fuzzy headwind with +1.41pp genuine improvement. EMEA: 96% of the −6.48pp drop is real accuracy errors, not borderline ambiguity.
HubFR W14FR W13Δ FRAcc Δ totalFuzzy explainsNon-fuzzy ΔVerdict
AMS1.76%1.57%+0.19pp−0.12pp −0.19pp+0.06pp Entire decline is fuzzy-driven. Non-fuzzy accuracy actually improved.
APAC2.08%1.59%+0.49pp+0.92pp −0.49pp+1.41pp Fuzzy headwind absorbed — non-fuzzy quality improved strongly (+1.41pp).
EMEA3.12%2.86%+0.26pp−6.48pp −0.26pp−6.21pp 96% of EMEA's decline is non-fuzzy. Fuzzy is a minor factor here.
Global2.35%2.00%+0.36pp−1.63pp −0.36pp−1.28pp Fuzzy = 22%, non-fuzzy = 78%
Key insight: The three hubs tell very different stories. AMS's small decline is 100% fuzzy — actual quality held steady. APAC powered through a large fuzzy increase with even larger genuine improvement. EMEA's massive drop is overwhelmingly real accuracy errors — fuzzy rate barely moved. This confirms EMEA's issue is fundamentally about moderation quality, not borderline-case ambiguity.
AMS — decline is 100% fuzzy-driven

AMS: a fuzzy story, not a quality story

AMS accuracy fell just −0.12pp, and the entire decline is explained by the fuzzy rate increase (+0.19pp). Once fuzzy is stripped out, AMS non-fuzzy accuracy actually improved by +0.06pp.

This means AMS's labeling quality is holding steady or improving — the headline number is being dragged by borderline cases being reclassified or new ambiguous content types entering the pipeline.

Action: Consider fuzzy calibration or policy clarification for the specific content types driving the 0.19pp fuzzy increase. This is a recoverable loss.

APAC — strong quality masked by fuzzy headwind

APAC: quality is better than the headline suggests

APAC's reported accuracy improved +0.92pp, but the underlying non-fuzzy improvement is actually +1.41pp — being partially masked by a +0.49pp fuzzy rate increase (the largest of any hub).

APAC absorbed the biggest fuzzy headwind and still delivered the best headline improvement. However, the fuzzy trend (+0.49pp WoW) needs monitoring — if it continues, it will eventually overwhelm the quality gains.

Action: Investigate whether policy updates or new content types in APAC are driving the fuzzy surge. The quality fundamentals are strong, but the fuzzy trajectory is concerning.

EMEA — fuzzy is a rounding error; the problem is real

EMEA: genuine moderation quality crisis

EMEA's fuzzy rate only increased +0.26pp, explaining just 4% of its massive −6.48pp accuracy decline. The remaining −6.21pp is pure non-fuzzy accuracy degradation.

This definitively rules out "borderline cases" as an explanation for EMEA's performance. The problem is fundamentally about labeler accuracy, policy interpretation, or operational execution — not content ambiguity.

EMEA also has the highest absolute fuzzy rate (3.12% vs 2.08% APAC, 1.76% AMS), suggesting a structural baseline of ambiguity in its content mix, but the week-over-week change is small.

01

Hub × project type

EMEA's three project types account for 129% of the decline. APAC General Recall is the largest single offset (−30.6%).

EMEA Appeal alone explains half the global decline — here's the shape
Appeal accuracy collapsed 82.7% → 76.6% (−6.06pp) while still carrying 15.6% of global weight. The rate effect (−1.156pp) is the single largest driver in the entire decomposition. Even the weight shrinkage couldn't offset it.
HubTypeAcc W14Acc W13Δ AccGWt W14GWt W13RateWeightInterTotal% of Δ
EMEAAppeal76.6%82.7%−6.0615.6%19.1%−1.156+0.113+0.212−0.83150.9%
EMEAGeneral Recall84.2%91.6%−7.3712.1%10.0%−0.734+0.119−0.156−0.77147.1%
EMEAAnalytics Appeal77.7%89.7%−12.014.7%3.2%−0.387+0.055−0.174−0.50530.9%
AMSGeneral Recall84.4%85.8%−1.3714.3%9.5%−0.130−0.005−0.066−0.20012.2%
APACAppeal85.4%85.7%−0.2715.9%18.8%−0.052+0.006+0.008−0.0372.3%
Negative subtotal−2.345143.4%
AMSAppeal72.9%78.7%−5.865.0%10.5%−0.614+0.394+0.320+0.100−6.1%
AMSAnalytics Appeal79.0%62.7%+16.350.5%0.6%+0.102+0.038−0.027+0.113−6.9%
APACGeneral Recall90.1%88.6%+1.4927.5%24.1%+0.359+0.091+0.051+0.501−30.6%
Positive subtotal+0.714−43.7%
AMS Appeal — accuracy did fall (rate = −0.61pp), but its accuracy is well below the global mean, so the weight halving from 10.5% → 5.0% was net positive for the global number (+0.39pp weight effect), flipping total contribution to +0.10pp.
EMEA Appeal deep dive — why −6.06pp accuracy drop?

EMEA Appeal: rate effect dominance

The −1.156pp rate effect is the single largest driver in this decomposition. EMEA Appeal dropped from 82.7% to 76.6%, a −6.06pp swing, while still carrying 15.6% global weight.

The weight did shrink (19.1% → 15.6%), which partially offset the damage (+0.113pp weight effect, +0.212pp interaction), but the sheer magnitude of the accuracy collapse overwhelms both offsets.

Key question: Is this driven by specific BPO sites, policy updates, or labeler calibration drift? See the "Top Projects" tab for project-level decomposition.

APAC General Recall — why it's the biggest offset

APAC GR: the stabilizer

APAC General Recall improved from 88.6% to 90.1% (+1.49pp) while also gaining weight (24.1% → 27.5%). This is the ideal scenario: an above-average segment both improves and grows.

All three effects are positive: rate (+0.359pp), weight (+0.091pp), interaction (+0.051pp), summing to +0.501pp — the single largest offset at −30.6% of the decline.

02

EMEA market breakdown

MENA1 + EN + SSA + DE + MENA2 = 110% of global decline, almost entirely rate-driven. SSA is a compounding case — weight grew while accuracy fell.

5 markets drive 110% of the global decline — here's the pattern
Every EMEA market dropped accuracy this week, but the damage is concentrated: MENA1 alone is 38.4%. The pattern is almost entirely rate-driven (labeler quality), not mix-shift. Only SSA compounds all three effects — weight grew into a below-mean, declining segment.
MarketAcc W14Acc W13Δ AccGWt W14GWt W13RateWeightInterTotal% of Δ
MENA180.5%90.4%−9.896.26%6.46%−0.639−0.009+0.020−0.62838.4%
EN (GB)78.8%88.5%−9.673.56%4.18%−0.404−0.016+0.061−0.36022.0%
SSA75.9%84.2%−8.223.65%2.46%−0.202−0.021−0.099−0.32119.7%
DE77.4%86.8%−9.422.93%2.90%−0.273+0.000−0.003−0.27616.9%
MENA275.8%80.9%−5.084.28%4.37%−0.222+0.005+0.005−0.21313.0%
IT84.2%93.2%−9.072.43%2.27%−0.205+0.012−0.015−0.20812.7%
IL67.2%83.8%−16.550.36%0.43%−0.071+0.001+0.011−0.0593.6%
UA74.3%77.3%−3.041.08%1.04%−0.032−0.003−0.001−0.0362.2%
SSA is the only top market where all three effects are negative — weight expanded (2.46%→3.65%), accuracy sits below the global mean, and accuracy also fell. A triple headwind worth investigating.
MENA1 deep dive — largest market contributor at 38.4%

MENA1: pure rate problem

MENA1 dropped from 90.4% to 80.5% (−9.89pp) while maintaining roughly stable weight (6.46% → 6.26%). The rate effect (−0.639pp) almost entirely explains its contribution.

This is a nearly pure accuracy regression — no confounding mix shifts. The investigation should focus on what changed in MENA1 labeling quality, policy interpretation, or task distribution during W14.

SSA triple headwind — all three effects negative

SSA: compounding failure mode

SSA is unique among all segments: rate, weight, and interaction are all negative.

Rate (−0.202pp): accuracy fell from 84.2% to 75.9%, a −8.22pp drop.

Weight (−0.021pp): SSA's weight grew from 2.46% to 3.65%, but since SSA accuracy (84.2%) was below the W13 global mean (85.9%), this expansion hurts.

Interaction (−0.099pp): the weight grew AND accuracy fell simultaneously — the worst combination.

Key question: Was the SSA weight increase intentional (ramp-up)? If so, quality support did not scale with volume.

IL — steepest single-market accuracy drop (−16.55pp)

IL: low weight limits global impact

IL has the most dramatic accuracy decline of any market (83.8% → 67.2%, −16.55pp), but its small weight (0.36%) limits global impact to just −0.059pp (3.6% of decline).

Still worth flagging: a 16.5pp drop likely indicates a systemic issue — new policy, labeler turnover, or task type change — that could worsen if IL weight increases.

03

Hub × GCP / non-GCP

EMEA non-GCP alone accounts for 105% of the global decline, almost purely through rate effect. EMEA GCP's weight surge (0.85%→2.98%) into a low-accuracy segment cost another −0.38pp.

EMEA non-GCP single-handedly exceeds the entire decline — here's why
At 29.4% global weight and a −5.8pp accuracy drop, EMEA non-GCP generates −1.72pp contribution (105.2% of total). Meanwhile EMEA GCP weight tripled into a 74%-accuracy segment — a weight trap costing −0.38pp even though its rate effect is tiny.
HubTypeAcc W14Acc W13GWt W14GWt W13RateWeightInterTotal% of Δ
EMEAnon-GCP80.7%86.5%29.4%31.4%−1.824−0.011+0.115−1.720105.2%
EMEAGCP69.8%74.0%2.98%0.85%−0.036−0.254−0.090−0.37923.2%
AMSnon-GCP81.9%83.0%10.9%10.9%−0.128−0.000−0.000−0.1287.8%
APACGCP82.4%85.1%2.69%3.32%−0.090+0.005+0.017−0.0674.1%
Negative subtotal−2.295140.4%
AMSGCP80.8%79.8%8.80%9.66%+0.097+0.053−0.009+0.141−8.7%
APACnon-GCP89.0%87.9%45.2%43.8%+0.477+0.027+0.015+0.519−31.7%
Positive subtotal+0.661−40.4%
EMEA GCP weight effect (−0.25pp) — weight tripled from 0.85% to 2.98%, but this segment's accuracy (74.0%) is far below the global mean (85.9%). Expanding a below-mean segment hurts the global number even at unchanged accuracy.
EMEA GCP weight surge — weight-driven damage

EMEA GCP: the weight trap

Unlike most segments where rate effect dominates, EMEA GCP's primary damage vector is the weight effect (−0.254pp). Weight tripled from 0.85% → 2.98%, but GCP accuracy (74.0%) is 11.9pp below the global mean.

The rate effect is tiny (−0.036pp) because the starting weight was so small. But the interaction (−0.090pp) compounds things: weight grew while accuracy also fell (74.0% → 69.8%).

Key question: Is this a deliberate GCP ramp-up or an allocation error? If it's a ramp-up, quality support needs to precede or accompany the volume increase.

APAC non-GCP — strongest offset segment

APAC non-GCP: the anchor

APAC non-GCP is the largest single segment by weight (45.2%) and improved from 87.9% → 89.0%. All three effects are positive, delivering +0.519pp total offset (−31.7% of decline).

This segment single-handedly prevented the global decline from being ~2.15pp instead of 1.63pp. Maintaining APAC non-GCP stability is critical to holding the floor.

04

EMEA — top 10 individual projects (shift-share)

Top 10 projects by negative contribution. GCP-appeal-GB-en-ALR-MNL leads at 25.1% — weight surged 6x while accuracy crashed from 100% → 69.9%. Four of the top five are Appeal or AA projects with weight expansion into below-mean accuracy.

Top 3 projects drive 67% of the global decline — here's the pattern
GB-ALR-MNL (25.1%): weight surged 6x into crashing accuracy. MENA2-CAS (22.0%): weight quadrupled into a chronically below-mean segment. MENA1-ANK (19.7%): pure accuracy regression, no mix excuse. The common thread: weight expansion without quality support.
ProjectTypeAcc W14Acc W13GWt W14GWt W13RateWeightInterTotal% of Δ
GCP-appeal-GB-en-ALR-MNLAppeal69.9%100.0%2.25%0.36%−0.108+0.266−0.568−0.41025.1%
AA-MENA2-ar-T&S-CASAA67.1%73.5%2.25%0.51%−0.033−0.215−0.112−0.36022.0%
GR General-MENA1-ku-CNX-ANKGR84.4%96.8%2.59%2.61%−0.322−0.002+0.002−0.32119.7%
appeal-KE/TZ/UG-sw-TP-NBOAppeal69.4%81.4%1.12%0.77%−0.092−0.016−0.043−0.1519.2%
GCP-GR General-GB-en-TP-ALBGR58.9%92.7%0.06%1.40%−0.473−0.091+0.451−0.1136.9%
GCP-appeal-IT-it-TP-BRVAppeal84.7%92.3%0.87%1.60%−0.122−0.046+0.055−0.1136.9%
appeal-MENA1-other-TP-MAKAppeal63.4%96.1%0.22%0.53%−0.174−0.032+0.101−0.1046.4%
GCP-GR General-DE-de-TLS-LEJGR75.4%85.4%1.00%0.46%−0.046−0.003−0.054−0.1026.3%
appeal-MENA1-ar-CNX-IBDAppeal74.2%78.9%1.48%1.17%−0.056−0.021−0.014−0.0915.6%
GR General-MENA1-ar-TP-MAKGRN/A100%0.00%0.53%−0.534−0.075+0.534−0.0754.6%
Weight expansion is the recurring theme: 6 of 10 projects saw weight increase — when that expansion targets below-mean or declining-accuracy segments, the interaction effect compounds the damage. Only GR-MENA1-ku-CNX-ANK is a pure rate story (stable weight, −12.4pp accuracy drop).
GCP-appeal-GB-en-ALR-MNL — #1 contributor at 25.1%, here's the mechanism

GB MNL: the weight surge trap

This project's weight surged 6.25x (0.36% → 2.25%) while accuracy crashed from 100% → 69.9%. The interaction effect (−0.568pp) is the largest single component — weight grew dramatically while accuracy fell dramatically.

The weight effect is actually positive (+0.266pp) because the project was above the global mean in W13 (100% vs 85.9%). But the interaction overwhelms it: expanding into what became a low-accuracy segment is a compounding failure.

Key question: Was this a deliberate ramp-up of a previously small project? If so, quality controls didn't scale with volume.

AA-MENA2-ar-T&S-CAS — weight quadrupled into a below-mean segment

MENA2 CAS: weight-driven damage

Weight grew from 0.51% → 2.25% (4.4x) while accuracy was already below the global mean (73.5%) and fell further to 67.1%. The weight effect alone (−0.215pp) is the largest component — this is a mix-shift problem, not primarily a rate problem.

All three effects are negative: rate (−0.033), weight (−0.215), interaction (−0.112). A triple headwind totaling −0.360pp (22.0% of decline).

Action: Validate whether this weight increase was intentional. Expanding a chronically below-mean segment without quality uplift compounds the global decline.

GR General-MENA1-ku-CNX-ANK — pure rate collapse, no mix excuse

MENA1 ANK: classic accuracy regression

This is the cleanest rate-driven case in the top 10: weight barely moved (2.61% → 2.59%), so the rate effect (−0.322pp) almost entirely explains the −0.321pp total contribution.

Accuracy dropped from 96.8% → 84.4% (−12.4pp) — a steep fall from a high base. No mix-shift or weight excuses here; something changed in execution quality.

Action: Investigate what changed for Kurdish-language GR in MENA1 during W14 — policy update, new labeler cohort, or calibration drift.

Recommended actions
1EMEA Appeal quality RCA — focus on EN, MENA1, DE BPO sites where rate effect is the dominant driver.
2DE-LEJ site investigation — two projects with catastrophic accuracy (0.0% and N/A), possible vendor execution failure.
3Four projects went to N/A (zero weight) in W14 — confirm whether this is sampling shortfall or project suspension.
4SSA weight expansion — the only market with a triple-negative (rate + weight + interaction all negative); validate if the volume increase is intentional.
5EMEA GCP weight surge (0.85% → 2.98%) into a 74%-accuracy segment — check if this is a ramp-up or reallocation, and whether quality support is in place.
6APAC fuzzy rate jumped +0.49pp (largest increase) — investigate whether policy updates or new content types are driving borderline cases. Non-fuzzy quality is strong, but the fuzzy trend needs monitoring.
7AMS decline is entirely fuzzy-driven — non-fuzzy accuracy actually improved. Consider whether fuzzy calibration or policy clarification could recover the 0.19pp loss.
Priority matrix — impact vs effort

Triage prioritization

P0 (immediate): DE-LEJ site — likely site-level outage/failure, accounts for 63.7% of decline from just two projects. Quick root cause identification could recover the most impact.

P1 (this week): Four N/A projects — verify if delivery gaps are fixable. If unplanned, restoring these could offset 167% of the decline (they overlap with rate-driven decline).

P1 (this week): MENA1 accuracy regression — 38.4% of decline, pure rate effect. Check if a policy update or labeler calibration issue occurred during W14.

P2 (track): SSA triple headwind and EMEA GCP weight surge — these are structural issues that need monitoring over W15–W16 to determine if they're transient or persistent.

P2 (track): APAC fuzzy rate surge (+0.49pp) — quality fundamentals are strong but the fuzzy trajectory needs monitoring. AMS fuzzy calibration is a quick-win candidate.

📋

W13 report not yet loaded

W13 (Mar 28 – Apr 3) data will appear here when the report is generated.

📋

W12 report not yet loaded

W12 (Mar 21 – Mar 27) data will appear here when the report is generated.

📋

W11 report not yet loaded

W11 (Mar 14 – Mar 20) data will appear here when the report is generated.