Moderation Quality · Weekly RCA Report

W14 −1.63pp

W13

W12

W11

Overall moderation accuracy dropped −1.63pp — here's why

Shift-share decomposition of W14 (Apr 4–10) vs W13 (Mar 28–Apr 3), isolating rate effect, weight effect, and their interaction. EMEA drives 128% of the decline; APAC offsets 31%.

Global W14

84.28%

▼ 1.63pp

from 85.91% · 100% of decline

EMEA · Wt 32.4%

79.65%

▼ 6.48pp

128% of global decline

APAC · Wt 47.8%

88.61%

▲ 0.92pp

−31% offset the decline

AMS · Wt 19.7%

81.37%

▼ 0.12pp

1% of global decline

◈ Overview −1.63pp

Methodology, decomposition & fuzzy

▦ Hub × Type 129%

EMEA Appeal alone = 50.9%

◉ EMEA Markets 110%

MENA1 leads at 38.4%

◆ GCP Split 105%

EMEA non-GCP exceeds total

★ Top Projects TOP 10

GB-MNL #1 at 25.1%

⚡ Actions 7

P0–P2 prioritized items

Methodology & summary

W14 (Apr 4–10) vs W13 (Mar 28–Apr 3). Each segment's total contribution is decomposed into three additive components. Positive % of Δ = contributed to the decline; negative = offset.

Rate effect = GWt_W13 × (Acc_W14 − Acc_W13) — pure accuracy change at prior weight
Weight effect = (GWt_W14 − GWt_W13) × (Acc_W13 − Global Acc_W13) — mix shift relative to global mean
Interaction = (GWt_W14 − GWt_W13) × (Acc_W14 − Acc_W13) — joint change

−2.64pp

Total rate effect

161% of decline

+0.76pp

Total weight effect

Offset 47%

+0.24pp

Total interaction

Offset 15%

⚠

Quality degraded across the board — here's why this matters

The rate effect (−2.64pp) alone would have caused a 3.4pp decline if the mix hadn't shifted favorably. The actual −1.63pp is the best-case outcome given how much accuracy fell — saved only by favorable weight rebalancing.

◆

APAC's growth was the safety net — here's how

APAC (88.6% accuracy, above global mean) grew from 47.2% → 47.8% of mix. This single shift absorbed nearly half the damage. Without it, the headline would read −3.1pp instead of −1.63pp.

▶ How to read this decomposition

Interpreting the three effects

Rate effect (161%) tells us accuracy degradation within segments — holding mix constant — more than fully explains the decline. This is the "quality got worse" signal.

Weight effect (offset 47%) means the mix actually shifted favorably: segments with above-average accuracy gained share. Without this, the decline would have been ~3.4pp instead of 1.63pp.

Interaction (offset 15%) captures the joint effect — segments that lost accuracy also tended to shrink in weight, providing a small additional buffer.

The sum: −2.64 + 0.76 + 0.24 = −1.63pp, matching the observed global decline exactly.

Fuzzy rate impact

Fuzzy cases are counted as errors in OEA. Global fuzzy rate rose from 2.00% to 2.35% — this +0.35pp increase directly cost 0.36pp of accuracy, explaining ~22% of the total decline. The remaining 78% is genuine accuracy degradation.

−0.36pp

Fuzzy rate increase

21.8% of total decline

−1.28pp

Non-fuzzy accuracy decline

78.2% of total decline

★

Three hubs, three completely different stories — here's the punchline

AMS: decline is 100% fuzzy — real quality held steady. APAC: powered through the biggest fuzzy headwind with +1.41pp genuine improvement. EMEA: 96% of the −6.48pp drop is real accuracy errors, not borderline ambiguity.

Hub	FR W14	FR W13	Δ FR	Acc Δ total	Fuzzy explains	Non-fuzzy Δ	Verdict
AMS	1.76%	1.57%	+0.19pp	−0.12pp	−0.19pp	+0.06pp	Entire decline is fuzzy-driven. Non-fuzzy accuracy actually improved.
APAC	2.08%	1.59%	+0.49pp	+0.92pp	−0.49pp	+1.41pp	Fuzzy headwind absorbed — non-fuzzy quality improved strongly (+1.41pp).
EMEA	3.12%	2.86%	+0.26pp	−6.48pp	−0.26pp	−6.21pp	96% of EMEA's decline is non-fuzzy. Fuzzy is a minor factor here.
Global	2.35%	2.00%	+0.36pp	−1.63pp	−0.36pp	−1.28pp	Fuzzy = 22%, non-fuzzy = 78%

Key insight: The three hubs tell very different stories. AMS's small decline is 100% fuzzy — actual quality held steady. APAC powered through a large fuzzy increase with even larger genuine improvement. EMEA's massive drop is overwhelmingly real accuracy errors — fuzzy rate barely moved. This confirms EMEA's issue is fundamentally about moderation quality, not borderline-case ambiguity.

▶ AMS — decline is 100% fuzzy-driven

AMS: a fuzzy story, not a quality story

AMS accuracy fell just −0.12pp, and the entire decline is explained by the fuzzy rate increase (+0.19pp). Once fuzzy is stripped out, AMS non-fuzzy accuracy actually improved by +0.06pp.

This means AMS's labeling quality is holding steady or improving — the headline number is being dragged by borderline cases being reclassified or new ambiguous content types entering the pipeline.

Action: Consider fuzzy calibration or policy clarification for the specific content types driving the 0.19pp fuzzy increase. This is a recoverable loss.

▶ APAC — strong quality masked by fuzzy headwind

APAC: quality is better than the headline suggests

APAC's reported accuracy improved +0.92pp, but the underlying non-fuzzy improvement is actually +1.41pp — being partially masked by a +0.49pp fuzzy rate increase (the largest of any hub).

APAC absorbed the biggest fuzzy headwind and still delivered the best headline improvement. However, the fuzzy trend (+0.49pp WoW) needs monitoring — if it continues, it will eventually overwhelm the quality gains.

Action: Investigate whether policy updates or new content types in APAC are driving the fuzzy surge. The quality fundamentals are strong, but the fuzzy trajectory is concerning.

▶ EMEA — fuzzy is a rounding error; the problem is real

EMEA: genuine moderation quality crisis

EMEA's fuzzy rate only increased +0.26pp, explaining just 4% of its massive −6.48pp accuracy decline. The remaining −6.21pp is pure non-fuzzy accuracy degradation.

This definitively rules out "borderline cases" as an explanation for EMEA's performance. The problem is fundamentally about labeler accuracy, policy interpretation, or operational execution — not content ambiguity.

EMEA also has the highest absolute fuzzy rate (3.12% vs 2.08% APAC, 1.76% AMS), suggesting a structural baseline of ambiguity in its content mix, but the week-over-week change is small.

Hub × project type

EMEA's three project types account for 129% of the decline. APAC General Recall is the largest single offset (−30.6%).

▼

EMEA Appeal alone explains half the global decline — here's the shape

Appeal accuracy collapsed 82.7% → 76.6% (−6.06pp) while still carrying 15.6% of global weight. The rate effect (−1.156pp) is the single largest driver in the entire decomposition. Even the weight shrinkage couldn't offset it.

Hub	Type	Acc W14	Acc W13	Δ Acc	GWt W14	GWt W13	Rate	Weight	Inter	Total	% of Δ
EMEA	Appeal	76.6%	82.7%	−6.06	15.6%	19.1%	−1.156	+0.113	+0.212	−0.831	50.9%
EMEA	General Recall	84.2%	91.6%	−7.37	12.1%	10.0%	−0.734	+0.119	−0.156	−0.771	47.1%
EMEA	Analytics Appeal	77.7%	89.7%	−12.01	4.7%	3.2%	−0.387	+0.055	−0.174	−0.505	30.9%
AMS	General Recall	84.4%	85.8%	−1.37	14.3%	9.5%	−0.130	−0.005	−0.066	−0.200	12.2%
APAC	Appeal	85.4%	85.7%	−0.27	15.9%	18.8%	−0.052	+0.006	+0.008	−0.037	2.3%
Negative subtotal										−2.345	143.4%
AMS	Appeal	72.9%	78.7%	−5.86	5.0%	10.5%	−0.614	+0.394	+0.320	+0.100	−6.1%
AMS	Analytics Appeal	79.0%	62.7%	+16.35	0.5%	0.6%	+0.102	+0.038	−0.027	+0.113	−6.9%
APAC	General Recall	90.1%	88.6%	+1.49	27.5%	24.1%	+0.359	+0.091	+0.051	+0.501	−30.6%
Positive subtotal										+0.714	−43.7%

AMS Appeal — accuracy did fall (rate = −0.61pp), but its accuracy is well below the global mean, so the weight halving from 10.5% → 5.0% was net positive for the global number (+0.39pp weight effect), flipping total contribution to +0.10pp.

▶ EMEA Appeal deep dive — why −6.06pp accuracy drop?

EMEA Appeal: rate effect dominance

The −1.156pp rate effect is the single largest driver in this decomposition. EMEA Appeal dropped from 82.7% to 76.6%, a −6.06pp swing, while still carrying 15.6% global weight.

The weight did shrink (19.1% → 15.6%), which partially offset the damage (+0.113pp weight effect, +0.212pp interaction), but the sheer magnitude of the accuracy collapse overwhelms both offsets.

Key question: Is this driven by specific BPO sites, policy updates, or labeler calibration drift? See the "Top Projects" tab for project-level decomposition.

▶ APAC General Recall — why it's the biggest offset

APAC GR: the stabilizer

APAC General Recall improved from 88.6% to 90.1% (+1.49pp) while also gaining weight (24.1% → 27.5%). This is the ideal scenario: an above-average segment both improves and grows.

All three effects are positive: rate (+0.359pp), weight (+0.091pp), interaction (+0.051pp), summing to +0.501pp — the single largest offset at −30.6% of the decline.

EMEA market breakdown

MENA1 + EN + SSA + DE + MENA2 = 110% of global decline, almost entirely rate-driven. SSA is a compounding case — weight grew while accuracy fell.

⚠

5 markets drive 110% of the global decline — here's the pattern

Every EMEA market dropped accuracy this week, but the damage is concentrated: MENA1 alone is 38.4%. The pattern is almost entirely rate-driven (labeler quality), not mix-shift. Only SSA compounds all three effects — weight grew into a below-mean, declining segment.

Market	Acc W14	Acc W13	Δ Acc	GWt W14	GWt W13	Rate	Weight	Inter	Total	% of Δ
MENA1	80.5%	90.4%	−9.89	6.26%	6.46%	−0.639	−0.009	+0.020	−0.628	38.4%
EN (GB)	78.8%	88.5%	−9.67	3.56%	4.18%	−0.404	−0.016	+0.061	−0.360	22.0%
SSA	75.9%	84.2%	−8.22	3.65%	2.46%	−0.202	−0.021	−0.099	−0.321	19.7%
DE	77.4%	86.8%	−9.42	2.93%	2.90%	−0.273	+0.000	−0.003	−0.276	16.9%
MENA2	75.8%	80.9%	−5.08	4.28%	4.37%	−0.222	+0.005	+0.005	−0.213	13.0%
IT	84.2%	93.2%	−9.07	2.43%	2.27%	−0.205	+0.012	−0.015	−0.208	12.7%
IL	67.2%	83.8%	−16.55	0.36%	0.43%	−0.071	+0.001	+0.011	−0.059	3.6%
UA	74.3%	77.3%	−3.04	1.08%	1.04%	−0.032	−0.003	−0.001	−0.036	2.2%

SSA is the only top market where all three effects are negative — weight expanded (2.46%→3.65%), accuracy sits below the global mean, and accuracy also fell. A triple headwind worth investigating.

▶ MENA1 deep dive — largest market contributor at 38.4%

MENA1: pure rate problem

MENA1 dropped from 90.4% to 80.5% (−9.89pp) while maintaining roughly stable weight (6.46% → 6.26%). The rate effect (−0.639pp) almost entirely explains its contribution.

This is a nearly pure accuracy regression — no confounding mix shifts. The investigation should focus on what changed in MENA1 labeling quality, policy interpretation, or task distribution during W14.

▶ SSA triple headwind — all three effects negative

SSA: compounding failure mode

SSA is unique among all segments: rate, weight, and interaction are all negative.

Rate (−0.202pp): accuracy fell from 84.2% to 75.9%, a −8.22pp drop.

Weight (−0.021pp): SSA's weight grew from 2.46% to 3.65%, but since SSA accuracy (84.2%) was below the W13 global mean (85.9%), this expansion hurts.

Interaction (−0.099pp): the weight grew AND accuracy fell simultaneously — the worst combination.

Key question: Was the SSA weight increase intentional (ramp-up)? If so, quality support did not scale with volume.

▶ IL — steepest single-market accuracy drop (−16.55pp)

IL: low weight limits global impact

IL has the most dramatic accuracy decline of any market (83.8% → 67.2%, −16.55pp), but its small weight (0.36%) limits global impact to just −0.059pp (3.6% of decline).

Still worth flagging: a 16.5pp drop likely indicates a systemic issue — new policy, labeler turnover, or task type change — that could worsen if IL weight increases.

Hub × GCP / non-GCP

EMEA non-GCP alone accounts for 105% of the global decline, almost purely through rate effect. EMEA GCP's weight surge (0.85%→2.98%) into a low-accuracy segment cost another −0.38pp.

■

EMEA non-GCP single-handedly exceeds the entire decline — here's why

At 29.4% global weight and a −5.8pp accuracy drop, EMEA non-GCP generates −1.72pp contribution (105.2% of total). Meanwhile EMEA GCP weight tripled into a 74%-accuracy segment — a weight trap costing −0.38pp even though its rate effect is tiny.

Hub	Type	Acc W14	Acc W13	GWt W14	GWt W13	Rate	Weight	Inter	Total	% of Δ
EMEA	non-GCP	80.7%	86.5%	29.4%	31.4%	−1.824	−0.011	+0.115	−1.720	105.2%
EMEA	GCP	69.8%	74.0%	2.98%	0.85%	−0.036	−0.254	−0.090	−0.379	23.2%
AMS	non-GCP	81.9%	83.0%	10.9%	10.9%	−0.128	−0.000	−0.000	−0.128	7.8%
APAC	GCP	82.4%	85.1%	2.69%	3.32%	−0.090	+0.005	+0.017	−0.067	4.1%
Negative subtotal									−2.295	140.4%
AMS	GCP	80.8%	79.8%	8.80%	9.66%	+0.097	+0.053	−0.009	+0.141	−8.7%
APAC	non-GCP	89.0%	87.9%	45.2%	43.8%	+0.477	+0.027	+0.015	+0.519	−31.7%
Positive subtotal									+0.661	−40.4%

EMEA GCP weight effect (−0.25pp) — weight tripled from 0.85% to 2.98%, but this segment's accuracy (74.0%) is far below the global mean (85.9%). Expanding a below-mean segment hurts the global number even at unchanged accuracy.

▶ EMEA GCP weight surge — weight-driven damage

EMEA GCP: the weight trap

Unlike most segments where rate effect dominates, EMEA GCP's primary damage vector is the weight effect (−0.254pp). Weight tripled from 0.85% → 2.98%, but GCP accuracy (74.0%) is 11.9pp below the global mean.

The rate effect is tiny (−0.036pp) because the starting weight was so small. But the interaction (−0.090pp) compounds things: weight grew while accuracy also fell (74.0% → 69.8%).

Key question: Is this a deliberate GCP ramp-up or an allocation error? If it's a ramp-up, quality support needs to precede or accompany the volume increase.

▶ APAC non-GCP — strongest offset segment

APAC non-GCP: the anchor

APAC non-GCP is the largest single segment by weight (45.2%) and improved from 87.9% → 89.0%. All three effects are positive, delivering +0.519pp total offset (−31.7% of decline).

This segment single-handedly prevented the global decline from being ~2.15pp instead of 1.63pp. Maintaining APAC non-GCP stability is critical to holding the floor.

EMEA — top 10 individual projects (shift-share)

Top 10 projects by negative contribution. GCP-appeal-GB-en-ALR-MNL leads at 25.1% — weight surged 6x while accuracy crashed from 100% → 69.9%. Four of the top five are Appeal or AA projects with weight expansion into below-mean accuracy.

▼

Top 3 projects drive 67% of the global decline — here's the pattern

GB-ALR-MNL (25.1%): weight surged 6x into crashing accuracy. MENA2-CAS (22.0%): weight quadrupled into a chronically below-mean segment. MENA1-ANK (19.7%): pure accuracy regression, no mix excuse. The common thread: weight expansion without quality support.

Project	Type	Acc W14	Acc W13	GWt W14	GWt W13	Rate	Weight	Inter	Total	% of Δ
GCP-appeal-GB-en-ALR-MNL	Appeal	69.9%	100.0%	2.25%	0.36%	−0.108	+0.266	−0.568	−0.410	25.1%
AA-MENA2-ar-T&S-CAS	AA	67.1%	73.5%	2.25%	0.51%	−0.033	−0.215	−0.112	−0.360	22.0%
GR General-MENA1-ku-CNX-ANK	GR	84.4%	96.8%	2.59%	2.61%	−0.322	−0.002	+0.002	−0.321	19.7%
appeal-KE/TZ/UG-sw-TP-NBO	Appeal	69.4%	81.4%	1.12%	0.77%	−0.092	−0.016	−0.043	−0.151	9.2%
GCP-GR General-GB-en-TP-ALB	GR	58.9%	92.7%	0.06%	1.40%	−0.473	−0.091	+0.451	−0.113	6.9%
GCP-appeal-IT-it-TP-BRV	Appeal	84.7%	92.3%	0.87%	1.60%	−0.122	−0.046	+0.055	−0.113	6.9%
appeal-MENA1-other-TP-MAK	Appeal	63.4%	96.1%	0.22%	0.53%	−0.174	−0.032	+0.101	−0.104	6.4%
GCP-GR General-DE-de-TLS-LEJ	GR	75.4%	85.4%	1.00%	0.46%	−0.046	−0.003	−0.054	−0.102	6.3%
appeal-MENA1-ar-CNX-IBD	Appeal	74.2%	78.9%	1.48%	1.17%	−0.056	−0.021	−0.014	−0.091	5.6%
GR General-MENA1-ar-TP-MAK	GR	N/A	100%	0.00%	0.53%	−0.534	−0.075	+0.534	−0.075	4.6%

Weight expansion is the recurring theme: 6 of 10 projects saw weight increase — when that expansion targets below-mean or declining-accuracy segments, the interaction effect compounds the damage. Only GR-MENA1-ku-CNX-ANK is a pure rate story (stable weight, −12.4pp accuracy drop).

▶ GCP-appeal-GB-en-ALR-MNL — #1 contributor at 25.1%, here's the mechanism

GB MNL: the weight surge trap

This project's weight surged 6.25x (0.36% → 2.25%) while accuracy crashed from 100% → 69.9%. The interaction effect (−0.568pp) is the largest single component — weight grew dramatically while accuracy fell dramatically.

The weight effect is actually positive (+0.266pp) because the project was above the global mean in W13 (100% vs 85.9%). But the interaction overwhelms it: expanding into what became a low-accuracy segment is a compounding failure.

Key question: Was this a deliberate ramp-up of a previously small project? If so, quality controls didn't scale with volume.

▶ AA-MENA2-ar-T&S-CAS — weight quadrupled into a below-mean segment

MENA2 CAS: weight-driven damage

Weight grew from 0.51% → 2.25% (4.4x) while accuracy was already below the global mean (73.5%) and fell further to 67.1%. The weight effect alone (−0.215pp) is the largest component — this is a mix-shift problem, not primarily a rate problem.

All three effects are negative: rate (−0.033), weight (−0.215), interaction (−0.112). A triple headwind totaling −0.360pp (22.0% of decline).

Action: Validate whether this weight increase was intentional. Expanding a chronically below-mean segment without quality uplift compounds the global decline.

▶ GR General-MENA1-ku-CNX-ANK — pure rate collapse, no mix excuse

MENA1 ANK: classic accuracy regression

This is the cleanest rate-driven case in the top 10: weight barely moved (2.61% → 2.59%), so the rate effect (−0.322pp) almost entirely explains the −0.321pp total contribution.

Accuracy dropped from 96.8% → 84.4% (−12.4pp) — a steep fall from a high base. No mix-shift or weight excuses here; something changed in execution quality.

Action: Investigate what changed for Kurdish-language GR in MENA1 during W14 — policy update, new labeler cohort, or calibration drift.

Recommended actions

1EMEA Appeal quality RCA — focus on EN, MENA1, DE BPO sites where rate effect is the dominant driver.

2DE-LEJ site investigation — two projects with catastrophic accuracy (0.0% and N/A), possible vendor execution failure.

3Four projects went to N/A (zero weight) in W14 — confirm whether this is sampling shortfall or project suspension.

4SSA weight expansion — the only market with a triple-negative (rate + weight + interaction all negative); validate if the volume increase is intentional.

5EMEA GCP weight surge (0.85% → 2.98%) into a 74%-accuracy segment — check if this is a ramp-up or reallocation, and whether quality support is in place.

6APAC fuzzy rate jumped +0.49pp (largest increase) — investigate whether policy updates or new content types are driving borderline cases. Non-fuzzy quality is strong, but the fuzzy trend needs monitoring.

7AMS decline is entirely fuzzy-driven — non-fuzzy accuracy actually improved. Consider whether fuzzy calibration or policy clarification could recover the 0.19pp loss.

▶ Priority matrix — impact vs effort

Triage prioritization

P0 (immediate): DE-LEJ site — likely site-level outage/failure, accounts for 63.7% of decline from just two projects. Quick root cause identification could recover the most impact.

P1 (this week): Four N/A projects — verify if delivery gaps are fixable. If unplanned, restoring these could offset 167% of the decline (they overlap with rate-driven decline).

P1 (this week): MENA1 accuracy regression — 38.4% of decline, pure rate effect. Check if a policy update or labeler calibration issue occurred during W14.

P2 (track): SSA triple headwind and EMEA GCP weight surge — these are structural issues that need monitoring over W15–W16 to determine if they're transient or persistent.

P2 (track): APAC fuzzy rate surge (+0.49pp) — quality fundamentals are strong but the fuzzy trajectory needs monitoring. AMS fuzzy calibration is a quick-win candidate.