🦉

Moderation Quality · Weekly RCA Report

W16 −0.04pp

W15 +1.92pp

W14 −1.82pp

W13

OMA held flat at −0.04pp — but underneath, the mix is highly turbulent

Shift-share decomposition of W16 (Apr 18–24) vs W15 (Apr 11–17). Global moved 86.04%→86.00%, yet ID and LATAM each contributed ~800% of Δ to the drag, offset by Adult Sexualized Behaviors and Tobacco recovering. The headline calm masks high single-policy and single-market volatility.

Global OMA W16

86.00%

▼ 0.04pp

from 86.04% · ≈ flat

ID · Wt 8.85%

86.81%

▼ 3.40pp

842% of global Δ

LATAM · Wt 9.24%

82.51%

▼ 3.03pp

806% of global Δ

A.S.B + Tobacco offsets

+0.77pp

▲ combined

−2189% offset

◈ Overview −0.04pp

Methodology & headline

◉ Markets 12 dragging

ID #1 at 842%

★ Top Policies 10+ severe

Violent Behaviors leads

⚡ Actions 7

Investigate the offsets

Methodology & headline summary

W16 (Apr 18–24) vs W15 (Apr 11–17). Each segment's contribution to OMA is decomposed into rate, weight, and interaction effects. Because the global Δ is tiny (−0.04pp), individual segment % of Δ figures can balloon — focus on absolute pp contributions to gauge true scale.

Rate effect = GWt_W15 × (Acc_W16 − Acc_W15) — pure accuracy change at prior weight
Weight effect = (GWt_W16 − GWt_W15) × (Acc_W15 − Global Acc_W15) — mix shift relative to global mean
Interaction = (GWt_W16 − GWt_W15) × (Acc_W16 − Acc_W15) — joint change

⚙

The headline is misleading — the underlying mix is highly turbulent

86.04% → 86.00% looks like a non-event. But: Violent Behaviors fell −19.32pp, Personal Information - High Risk fell −31.01pp, Disparaging Religion fell −75.96pp. They were offset by equally severe gains: Adult Sexualized Behaviors recovering +3.87pp on heavy weight, Tobacco +2.16pp continuing its rebound, Invasive Cosmetic +22.73pp. Net ≈ 0.

⚠

Geographic drag concentrated in ID + LATAM = 1648% of global Δ

ID (−3.40pp accuracy, 842% of Δ) and LATAM (−3.03pp, 806% of Δ) each contributed more than 8× the global decline. SSA, PH, MENA1, BD, JP, TR all add 200%+ each. ID is now in its 3rd consecutive WoW decline: W13→W14 −3.16pp, W14→W15 +0.58pp, W15→W16 −3.40pp.

▲

Adult Sexualized Behaviors + Tobacco delivered +0.77pp combined offset — what saved the headline

Adult Sexualized Behaviors (+3.87pp accuracy on 5.0% weight, contribution +0.43pp) and Tobacco & Nicotine (+2.16pp on 10.81% weight, contribution +0.33pp) together offset 2189% of the global decline. MENA2 recovered +6.84pp accuracy regionally (+0.30pp). Without these three, the headline would read closer to −1.5pp.

▶ Why "% of Δ" looks extreme this week

Small denominator, large numerator

Global Δ = −0.04pp. When a segment contributes −0.30pp (a normal magnitude), it's ~750% of the global change. This is mathematically correct but visually scary.

The right interpretation: treat absolute pp contributions as the signal. Anything > 0.10pp is materially large in absolute terms — and the W16 table has 10+ such items on each side, indicating high underlying volatility.

If next week one of the offsets fails to repeat (e.g., Tobacco continues recovering but Adult Sexualized Behaviors regresses), the headline could swing 1–2pp easily. The current calm is fragile.

▶ Data integrity flag — several policies report 0% accuracy

0%-accuracy policies need verification

Four policies show 0% accuracy in both W15 and W16 yet still contribute meaningfully to the global delta via weight changes:

Animal Abuse & Graphic Content: 0% → 0%, weight 0.13% → 0.39% (contribution −0.225pp)
Youth Physical Abuse, Assault & Neglect: 0% → 0%, weight 0.22% → 0.38%
Graphic Content: 0% → 0%, weight grew
Adult Sexual Abuse: 0% → 0%, weight 0.41% → 0.50%

A persistent 0% on a non-trivial sample is implausible as a true accuracy figure. Likely causes: data filter excluding all "approve" cases for these policies, sampling artifact, or definitional change. Verify before treating these as real signal.

By market — top contributors

⚠

12 markets dragging at 100%+ each — but 8 markets offset more than the entire decline

Drag side: ID 842%, LATAM 806%, SSA 440%, PH 398%, MENA1 347%, BD 294%, JP 252%, TR 241%. Offset side: MENA2 +6.84pp recovery, VN grew on improvement, BR +3.43pp, IT +5.37pp.

Market	Acc W15	Acc W16	Δ Acc	Wt W15	Wt W16	Rate	Weight	Inter	Total	% of Δ
ID	90.22%	86.81%	−3.40	8.70%	8.85%	−0.296	+0.006	−0.005	−0.295	842.3%
LATAM	85.54%	82.51%	−3.03	8.79%	9.24%	−0.266	−0.002	−0.014	−0.282	805.8%
SSA	79.43%	75.32%	−4.11	3.37%	3.52%	−0.139	−0.010	−0.006	−0.154	439.8%
PH	87.45%	83.99%	−3.45	4.15%	3.95%	−0.143	−0.003	+0.007	−0.139	397.7%
MENA1	86.79%	84.98%	−1.81	5.60%	7.50%	−0.101	+0.014	−0.034	−0.122	347.2%
BD	86.85%	84.98%	−1.87	5.28%	5.68%	−0.099	+0.003	−0.007	−0.103	294.1%
JP	92.01%	90.48%	−1.53	2.28%	1.07%	−0.035	−0.072	+0.018	−0.088	252.1%
TR	88.70%	84.36%	−4.34	1.97%	1.92%	−0.085	−0.001	+0.002	−0.084	241.0%
ES	89.25%	82.39%	−6.87	1.08%	1.00%	−0.074	−0.003	+0.006	−0.071	203.6%
MX	82.78%	81.87%	−0.91	4.65%	5.26%	−0.042	−0.020	−0.006	−0.068	193.8%
Top-10 negative subtotal									−1.408	4017.4%
MENA2	77.79%	84.63%	+6.84	4.39%	4.00%	+0.300	+0.032	−0.027	+0.305	−870.2%
VN	90.16%	91.69%	+1.53	6.84%	7.73%	+0.105	+0.037	+0.014	+0.155	−443.5%
BR	83.73%	87.16%	+3.43	4.46%	4.75%	+0.153	−0.007	+0.005	+0.150	−429.1%
IT	86.41%	91.78%	+5.37	2.47%	2.39%	+0.133	+0.000	−0.005	+0.128	−364.9%
MY	83.36%	90.47%	+7.11	1.67%	1.52%	+0.119	−0.004	−0.007	+0.108	−309.6%
Top-5 positive subtotal									+0.846	−2417.3%

JP weight collapse (2.28% → 1.07%, −1.21pp) is the largest single mix-shift event among draggers. Despite the small accuracy decline (−1.53pp), the weight effect (−0.072pp) is unusually large because JP W15 accuracy (92%) was well above the global mean — shrinking it removes a high-quality contributor from the mix.

▶ ID #1 dragger — 4 weeks of consecutive declines

ID: a recurring pattern, not a one-off

ID OMA accuracy fell from 90.22% → 86.81% in W16 (−3.40pp). This is the third significant decline in four weeks: W13 92.79% → W14 89.63% → W15 90.22% → W16 86.81%. Cumulative drop: −5.97pp from W13 baseline.

The market is also gaining global share (8.70% → 8.85%) while accuracy worsens — the interaction effect is small but negative. Suggests either Indonesia-specific moderation quality is degrading, or the additional volume is concentrated in harder-to-judge content.

Action: Request structured Indonesia retrospective. The trend is now clear enough to need a dedicated investigation.

▶ LATAM #2 dragger — what's behind the −3.03pp accuracy drop

LATAM: pure rate-effect dominance

LATAM accuracy fell from 85.54% → 82.51%, a 3.03pp drop. Weight grew slightly (8.79% → 9.24%) which marginally amplified damage via interaction (−0.014pp).

The rate component (−0.266pp) is by far the largest driver. Investigate whether a regional policy change, language model update, or sampling shift hit the LATAM portfolio specifically in W16.

▶ MENA2 #1 offset — +6.84pp recovery, is it durable?

MENA2: bounce-back from a chronic underperformer

MENA2 accuracy jumped 77.79% → 84.63% (+6.84pp). Weight contracted slightly (4.39% → 4.00%), so this is overwhelmingly a rate story.

Looking back, MENA2 has been a problem region — this single-week recovery is the largest market gain in the dataset. Whether it's durable depends on whether the W14–W15 issue was a one-off (sample anomaly, transient labeling problem) or whether deeper calibration work lifted the floor.

Confirm with the regional team whether structural changes were made.

By policy title — top contributors

⚠

Multiple policies dropped 10–35pp accuracy this week

Severe single-policy drops include Disparaging Religion (−75.96pp, 89.07%→13.11%), Light Body Exposure (−36.83pp), Personal Information - High Risk (−31.01pp), NSA Exceptions - Mature (−22.85pp), Suicide & NSSI (−21.99pp), Violent Behaviors (−19.32pp). Even with small weights, these aggregate fast.

▲

Tobacco & Nicotine — 2nd-heaviest policy (10.81%) acted as a stabilizer for the second straight week

Tobacco accuracy continued recovering: 79.04% → 81.20% (+2.16pp) and its share contracted 12.23% → 10.81% (−1.42pp). Because Tobacco accuracy is well below the global mean (−7.0pp from 86.04%), shrinking its weight is a strong net positive. Combined contribution: +0.334pp (offsetting 952% of the global Δ).

▶ Tobacco & Nicotine — deep dive into a sustained recovery

Tobacco's outsized impact

At 10.81% of W16 sample weight, Tobacco & Nicotine is the 2nd-largest single policy (after Youth Regulated Goods at 12.29%). Its accuracy moves the global needle directly.

Multi-week trajectory: clear recovery from W14 trough

W13: 84.66% acc, 13.23% wt — recent peak
W14: 76.07% acc, 11.06% wt — −8.59pp single-week collapse
W15: 79.04% acc, 12.23% wt — partial recovery (+2.97pp)
W16: 81.20% acc, 10.81% wt — continued recovery (+2.16pp)

Tobacco quality has rebounded ~5.13pp from the W14 trough, but is still 3.46pp below its W13 baseline. The trajectory is clearly positive.

Shift-share decomposition (W16 vs W15)

Rate effect: +0.264pp — accuracy gain at prior weight
Weight effect: +0.099pp — shrinking a below-mean segment helps
Interaction: −0.031pp — small, accuracy ↑ while weight ↓
Total: +0.334pp (≈ −952% of the global Δ)

What to watch

Sample volume: 1,878 → 1,537 cases (−18%). Some of the weight contraction may reflect a sampling change. Verify the methodology hasn't changed.

Below-mean accuracy persistence: at 81.20%, Tobacco is still 4.80pp below the global mean. If volume rebounds before quality recovers further, the helpful weight-effect direction will reverse — Tobacco could flip back to a major drag.

Action: Lock in the recovery — confirm whether the W14 trough was an isolated event and whether the 3-week rebound has structural support, not just regression-to-mean.

Policy	Acc W15	Acc W16	Δ Acc	Wt W15	Wt W16	Rate	Weight	Inter	Total	% of Δ
Violent Behaviors	76.78%	57.46%	−19.32	1.47%	1.77%	−0.284	−0.029	−0.058	−0.371	1059.4%
Gambling - Depiction and Promotion	69.68%	59.89%	−9.79	1.51%	2.07%	−0.148	−0.092	−0.054	−0.293	836.9%
Dangerous Trends - Serious Harm	68.09%	63.14%	−4.95	4.83%	4.95%	−0.239	−0.022	−0.006	−0.266	758.7%
Personal Information - High Risk	84.12%	53.12%	−31.01	0.67%	0.71%	−0.208	−0.001	−0.011	−0.220	627.4%
Youth Non-Sexualized Nudity	76.77%	74.61%	−2.16	4.86%	5.60%	−0.105	−0.069	−0.016	−0.189	540.2%
Youth Body Exposure - Light (4-17)	40.38%	37.08%	−3.30	0.67%	0.98%	−0.022	−0.146	−0.010	−0.178	507.6%
Youth Regulated Goods and Services	73.69%	72.65%	−1.04	12.10%	12.29%	−0.126	−0.023	−0.002	−0.151	430.1%
Light Body Exposure	70.00%	33.17%	−36.83	0.08%	0.30%	−0.029	−0.036	−0.082	−0.147	419.9%
High Risk Driving	64.91%	60.73%	−4.18	2.33%	2.53%	−0.097	−0.041	−0.008	−0.147	419.7%
Regulated Goods - Marketing/Trade	47.96%	48.53%	+0.57	1.44%	1.80%	+0.008	−0.135	+0.002	−0.129	367.7%
Top-10 negative subtotal									−2.090	5967.7%
Adult Sexualized Behaviors	54.88%	58.75%	+3.87	5.77%	5.00%	+0.224	+0.239	−0.030	+0.433	−1236.1%
Tobacco and Nicotine ★ 2nd heaviest policy	79.04%	81.20%	+2.16	12.23%	10.81%	+0.264	+0.099	−0.031	+0.334	−952.5%
Invasive Cosmetic Procedures	65.14%	87.86%	+22.73	1.30%	2.26%	+0.295	−0.201	+0.219	+0.313	−894.6%
Combat sports, Extreme Sports & Stunts	75.02%	82.54%	+7.51	4.04%	4.28%	+0.304	−0.026	+0.018	+0.296	−844.0%
Moderate Bullying	48.14%	50.83%	+2.69	2.26%	1.62%	+0.061	+0.247	−0.017	+0.290	−826.3%
Top-5 positive subtotal									+1.666	−4753.5%

Severe single-policy regressions: Disparaging Religion 89.07%→13.11% (−75.96pp); Suicide & NSSI 57.37%→35.38% (−21.99pp); NSA Exceptions - Mature 53.62%→30.76% (−22.85pp); Adult Sexual Solicitation 57.81%→46.33% (−11.47pp). These weren't in the top-10 by total contribution because their weights are tiny (<1%), but the rate magnitudes warrant individual investigation. Several policies report 0% accuracy in both weeks (Animal Abuse, Youth Physical Abuse, Graphic Content, Adult Sexual Abuse) — likely a data integrity issue, not real signal.

▶ Violent Behaviors #1 — −19.32pp drop on growing weight

Violent Behaviors: triple-negative, all three effects against

Accuracy collapsed 76.78% → 57.46% (−19.32pp). Weight grew (1.47% → 1.77%), so the additional volume entered a now-failing segment — interaction effect (−0.058pp) compounds the damage.

This is one of the largest reputational-risk policy categories. A 19pp accuracy drop combined with growing volume is a serious signal — escalate immediately.

▶ Personal Information - High Risk — −31pp single week

Personal Info High Risk: catastrophic single-week drop

Accuracy fell 84.12% → 53.12% (−31.01pp) on stable weight (~0.69%). The pure rate effect (−0.208pp) entirely explains this row's contribution.

A 31pp drop on a privacy-related, high-stakes policy is alarming. Possible drivers: policy interpretation change, new content vector (e.g., new types of doxxing patterns), or model/labeler retraining gone wrong. Investigate before W17.

▶ Adult Sexualized Behaviors — +0.43pp top offset, what drove it

A.S.B: the largest single offset

Adult Sexualized Behaviors recovered 54.88% → 58.75% (+3.87pp). Weight contracted 5.77% → 5.00% (−0.77pp). Both effects are favorable: rate (+0.224pp) and weight (+0.239pp) — shrinking a below-mean segment helps.

This single policy contributed +0.433pp — by itself, more than 12× the global Δ in the offsetting direction. Worth understanding what drove the accuracy jump (calibration, content shift, sampling) since A.S.B is a chronic problem area.

▶ Disparaging Religion — 89.07% → 13.11% (−75.96pp)

Disparaging Religion: most severe rate drop

This policy collapsed by 75.96pp on a tiny sample weight (~0.08–0.13%). Global impact is "only" −0.099pp (246%), but the rate magnitude is unprecedented.

Almost certainly a sample/policy/labeling artifact — a 76pp single-week swing is implausible as a true accuracy change. Verify the W16 sample is representative; if it is, escalate as a critical operational failure.

Recommended actions

1Don't celebrate the −0.04pp headline. The mix is unstable: 10+ policies dropped 10–35pp this week, balanced by equally large gainers. If next week one of the gainers fails to repeat, headline could swing 1–2pp.

2Investigate Violent Behaviors (−19.32pp accuracy, weight growing) — triple-negative on a high-reputational-risk category. Escalate to policy ops.

3Personal Information - High Risk dropped 31pp — privacy-sensitive, suspicious magnitude. Audit sample composition and labeler agreement before W17.

4ID + LATAM combined drag of 1648% — both regions saw 3+pp accuracy drops. Region-level RCA needed to determine if this is a shared cause (model update, content shift) or independent.

5ID is on a 3-of-4-week declining trend (W13 92.79% → W16 86.81%, cumulative −5.97pp). This is no longer a single-week event — request a structured Indonesia retrospective.

6Verify the offsets are real, not artifacts. Disparaging Religion (−76pp), Invasive Cosmetic (+23pp), MENA2 region (+6.84pp), Adult Fetish & Kinks (+32pp) — these magnitudes invite sampling/labeling scrutiny before being trusted as signal.

7Data integrity: 4 policies report 0% accuracy in both weeks (Animal Abuse & Graphic Content, Youth Physical Abuse, Graphic Content, Adult Sexual Abuse) yet still drag the global via weight changes. Likely a data filter or definitional issue — fix before treating as RCA signal.

▶ Priority matrix — what to triage first

Triage prioritization

P0 (immediate, integrity risk): Personal Information - High Risk (−31pp), Violent Behaviors (−19.32pp). Both are reputational categories with material accuracy regression on growing or stable weight.

P0 (data integrity): Verify Disparaging Religion (−76pp), 0%-accuracy policies, and other extreme single-policy swings are not sample/labeling artifacts. Swings of this size are more likely measurement issues than real changes.

P1 (regional): ID + LATAM joint investigation. If the cause is shared (e.g., a regional model rollout), one fix solves both. Otherwise treat as independent.

P1 (trend): ID 4-week decline pattern — even if W16 isolated event resolves, the trend itself warrants attention.

P2 (lock in gains): Tobacco & Nicotine recovery (3 weeks now positive) and Adult Sexualized Behaviors offset — confirm structural drivers, not just regression-to-mean.

P3 (signal hygiene): Replace single-week % of Δ as the primary metric for non-trivial WoW reports — when global Δ < 0.1pp, use absolute pp contributions instead.

Global W14

84.28%

▼ 1.63pp

from 85.91% · 100% of decline

EMEA · Wt 32.4%

79.65%

▼ 6.48pp

128% of global decline

APAC · Wt 47.8%

88.61%

▲ 0.92pp

−31% offset the decline

AMS · Wt 19.7%

81.37%

▼ 0.12pp

1% of global decline

◈ Overview −1.63pp

Methodology, decomposition & fuzzy

▦ Hub × Type 129%

EMEA Appeal alone = 50.9%

◉ EMEA Markets 110%

MENA1 leads at 38.4%

★ Top Projects TOP 10

GB-MNL #1 at 25.1%

⚡ Actions 7

P0–P2 prioritized items

Methodology & summary

W14 (Apr 4–10) vs W13 (Mar 28–Apr 3). Each segment's total contribution is decomposed into three additive components. Positive % of Δ = contributed to the decline; negative = offset.

Rate effect = GWt_W13 × (Acc_W14 − Acc_W13) — pure accuracy change at prior weight
Weight effect = (GWt_W14 − GWt_W13) × (Acc_W13 − Global Acc_W13) — mix shift relative to global mean
Interaction = (GWt_W14 − GWt_W13) × (Acc_W14 − Acc_W13) — joint change

−2.64pp

Total rate effect

161% of decline

+0.76pp

Total weight effect

Offset 47%

+0.24pp

Total interaction

Offset 15%

⚠

Quality degraded across the board — here's why this matters

The rate effect (−2.64pp) alone would have caused a 3.4pp decline if the mix hadn't shifted favorably. The actual −1.63pp is the best-case outcome given how much accuracy fell — saved only by favorable weight rebalancing.

◆

APAC's growth was the safety net — here's how

APAC (88.6% accuracy, above global mean) grew from 47.2% → 47.8% of mix. This single shift absorbed nearly half the damage. Without it, the headline would read −3.1pp instead of −1.63pp.

▶ How to read this decomposition

Interpreting the three effects

Rate effect (161%) tells us accuracy degradation within segments — holding mix constant — more than fully explains the decline. This is the "quality got worse" signal.

Weight effect (offset 47%) means the mix actually shifted favorably: segments with above-average accuracy gained share. Without this, the decline would have been ~3.4pp instead of 1.63pp.

Interaction (offset 15%) captures the joint effect — segments that lost accuracy also tended to shrink in weight, providing a small additional buffer.

The sum: −2.64 + 0.76 + 0.24 = −1.63pp, matching the observed global decline exactly.

Fuzzy rate impact

−0.36pp

Fuzzy rate increase

21.8% of total decline

−1.28pp

Non-fuzzy accuracy decline

78.2% of total decline

★

Fuzzy rate rose +0.35pp — but three hubs tell completely different stories

AMS: decline is 100% fuzzy — real quality held steady. APAC: powered through the biggest fuzzy headwind (+0.49pp) with +1.41pp genuine improvement. EMEA: 96% of the −6.48pp drop is real accuracy errors, not borderline ambiguity.

Hub	FR W14	FR W13	Δ FR	Acc Δ total	Fuzzy explains	Non-fuzzy Δ	Verdict
AMS	1.76%	1.57%	+0.19pp	−0.12pp	−0.19pp	+0.06pp	Entire decline is fuzzy-driven. Non-fuzzy accuracy actually improved.
APAC	2.08%	1.59%	+0.49pp	+0.92pp	−0.49pp	+1.41pp	Fuzzy headwind absorbed — non-fuzzy quality improved strongly (+1.41pp).
EMEA	3.12%	2.86%	+0.26pp	−6.48pp	−0.26pp	−6.21pp	96% of EMEA's decline is non-fuzzy. Fuzzy is a minor factor here.
Global	2.35%	2.00%	+0.36pp	−1.63pp	−0.36pp	−1.28pp	Fuzzy = 22%, non-fuzzy = 78%

Key insight: The three hubs tell very different stories. AMS's small decline is 100% fuzzy — actual quality held steady. APAC powered through a large fuzzy increase with even larger genuine improvement. EMEA's massive drop is overwhelmingly real accuracy errors — fuzzy rate barely moved. This confirms EMEA's issue is fundamentally about moderation quality, not borderline-case ambiguity.

▶ AMS — decline is 100% fuzzy-driven

AMS: a fuzzy story, not a quality story

AMS accuracy fell just −0.12pp, and the entire decline is explained by the fuzzy rate increase (+0.19pp). Once fuzzy is stripped out, AMS non-fuzzy accuracy actually improved by +0.06pp.

This means AMS's labeling quality is holding steady or improving — the headline number is being dragged by borderline cases being reclassified or new ambiguous content types entering the pipeline.

Action: Consider fuzzy calibration or policy clarification for the specific content types driving the 0.19pp fuzzy increase. This is a recoverable loss.

▶ APAC — strong quality masked by fuzzy headwind

APAC: quality is better than the headline suggests

APAC's reported accuracy improved +0.92pp, but the underlying non-fuzzy improvement is actually +1.41pp — being partially masked by a +0.49pp fuzzy rate increase (the largest of any hub).

APAC absorbed the biggest fuzzy headwind and still delivered the best headline improvement. However, the fuzzy trend (+0.49pp WoW) needs monitoring — if it continues, it will eventually overwhelm the quality gains.

Action: Investigate whether policy updates or new content types in APAC are driving the fuzzy surge. The quality fundamentals are strong, but the fuzzy trajectory is concerning.

▶ EMEA — fuzzy is a rounding error; the problem is real

EMEA: genuine moderation quality crisis

EMEA's fuzzy rate only increased +0.26pp, explaining just 4% of its massive −6.48pp accuracy decline. The remaining −6.21pp is pure non-fuzzy accuracy degradation.

This definitively rules out "borderline cases" as an explanation for EMEA's performance. The problem is fundamentally about labeler accuracy, policy interpretation, or operational execution — not content ambiguity.

EMEA also has the highest absolute fuzzy rate (3.12% vs 2.08% APAC, 1.76% AMS), suggesting a structural baseline of ambiguity in its content mix, but the week-over-week change is small.

Hub × project type

▼

EMEA's three project types account for 129% of the decline

EMEA Appeal alone is 50.9%: accuracy collapsed 82.7% → 76.6% (−6.06pp) while still carrying 15.6% of global weight. APAC General Recall is the largest single offset (−30.6%), improving to 90.1% while gaining share.

Hub	Type	Acc W14	Acc W13	Δ Acc	GWt W14	GWt W13	Rate	Weight	Inter	Total	% of Δ
EMEA	Appeal	76.6%	82.7%	−6.06	15.6%	19.1%	−1.156	+0.113	+0.212	−0.831	50.9%
EMEA	General Recall	84.2%	91.6%	−7.37	12.1%	10.0%	−0.734	+0.119	−0.156	−0.771	47.1%
EMEA	Analytics Appeal	77.7%	89.7%	−12.01	4.7%	3.2%	−0.387	+0.055	−0.174	−0.505	30.9%
AMS	General Recall	84.4%	85.8%	−1.37	14.3%	9.5%	−0.130	−0.005	−0.066	−0.200	12.2%
APAC	Appeal	85.4%	85.7%	−0.27	15.9%	18.8%	−0.052	+0.006	+0.008	−0.037	2.3%
Negative subtotal										−2.345	143.4%
AMS	Appeal	72.9%	78.7%	−5.86	5.0%	10.5%	−0.614	+0.394	+0.320	+0.100	−6.1%
AMS	Analytics Appeal	79.0%	62.7%	+16.35	0.5%	0.6%	+0.102	+0.038	−0.027	+0.113	−6.9%
APAC	General Recall	90.1%	88.6%	+1.49	27.5%	24.1%	+0.359	+0.091	+0.051	+0.501	−30.6%
Positive subtotal										+0.714	−43.7%

AMS Appeal — accuracy did fall (rate = −0.61pp), but its accuracy is well below the global mean, so the weight halving from 10.5% → 5.0% was net positive for the global number (+0.39pp weight effect), flipping total contribution to +0.10pp.

▶ EMEA Appeal deep dive — why −6.06pp accuracy drop?

EMEA Appeal: rate effect dominance

The −1.156pp rate effect is the single largest driver in this decomposition. EMEA Appeal dropped from 82.7% to 76.6%, a −6.06pp swing, while still carrying 15.6% global weight.

The weight did shrink (19.1% → 15.6%), which partially offset the damage (+0.113pp weight effect, +0.212pp interaction), but the sheer magnitude of the accuracy collapse overwhelms both offsets.

Key question: Is this driven by specific BPO sites, policy updates, or labeler calibration drift? See the "Top Projects" tab for project-level decomposition.

▶ APAC General Recall — why it's the biggest offset

APAC GR: the stabilizer

APAC General Recall improved from 88.6% to 90.1% (+1.49pp) while also gaining weight (24.1% → 27.5%). This is the ideal scenario: an above-average segment both improves and grows.

All three effects are positive: rate (+0.359pp), weight (+0.091pp), interaction (+0.051pp), summing to +0.501pp — the single largest offset at −30.6% of the decline.

EMEA market breakdown

⚠

5 markets drive 110% of the global decline — almost entirely rate-driven

MENA1 + EN + SSA + DE + MENA2. The damage is concentrated: MENA1 alone is 38.4%. Only SSA compounds all three effects — weight grew into a below-mean, declining segment.

Market	Acc W14	Acc W13	Δ Acc	GWt W14	GWt W13	Rate	Weight	Inter	Total	% of Δ
MENA1	80.5%	90.4%	−9.89	6.26%	6.46%	−0.639	−0.009	+0.020	−0.628	38.4%
EN (GB)	78.8%	88.5%	−9.67	3.56%	4.18%	−0.404	−0.016	+0.061	−0.360	22.0%
SSA	75.9%	84.2%	−8.22	3.65%	2.46%	−0.202	−0.021	−0.099	−0.321	19.7%
DE	77.4%	86.8%	−9.42	2.93%	2.90%	−0.273	+0.000	−0.003	−0.276	16.9%
MENA2	75.8%	80.9%	−5.08	4.28%	4.37%	−0.222	+0.005	+0.005	−0.213	13.0%
IT	84.2%	93.2%	−9.07	2.43%	2.27%	−0.205	+0.012	−0.015	−0.208	12.7%
IL	67.2%	83.8%	−16.55	0.36%	0.43%	−0.071	+0.001	+0.011	−0.059	3.6%
UA	74.3%	77.3%	−3.04	1.08%	1.04%	−0.032	−0.003	−0.001	−0.036	2.2%

SSA is the only top market where all three effects are negative — weight expanded (2.46%→3.65%), accuracy sits below the global mean, and accuracy also fell. A triple headwind worth investigating.

▶ MENA1 deep dive — largest market contributor at 38.4%

MENA1: pure rate problem

MENA1 dropped from 90.4% to 80.5% (−9.89pp) while maintaining roughly stable weight (6.46% → 6.26%). The rate effect (−0.639pp) almost entirely explains its contribution.

This is a nearly pure accuracy regression — no confounding mix shifts. The investigation should focus on what changed in MENA1 labeling quality, policy interpretation, or task distribution during W14.

▶ SSA triple headwind — all three effects negative

SSA: compounding failure mode

SSA is unique among all segments: rate, weight, and interaction are all negative.

Rate (−0.202pp): accuracy fell from 84.2% to 75.9%, a −8.22pp drop.

Weight (−0.021pp): SSA's weight grew from 2.46% to 3.65%, but since SSA accuracy (84.2%) was below the W13 global mean (85.9%), this expansion hurts.

Interaction (−0.099pp): the weight grew AND accuracy fell simultaneously — the worst combination.

Key question: Was the SSA weight increase intentional (ramp-up)? If so, quality support did not scale with volume.

▶ IL — steepest single-market accuracy drop (−16.55pp)

IL: low weight limits global impact

IL has the most dramatic accuracy decline of any market (83.8% → 67.2%, −16.55pp), but its small weight (0.36%) limits global impact to just −0.059pp (3.6% of decline).

Still worth flagging: a 16.5pp drop likely indicates a systemic issue — new policy, labeler turnover, or task type change — that could worsen if IL weight increases.

EMEA — top 10 individual projects (shift-share)

▼

Top 3 projects drive 67% of the global decline

GB-ALR-MNL (25.1%): weight surged 6x into crashing accuracy. MENA2-CAS (22.0%): weight quadrupled into a chronically below-mean segment. MENA1-ANK (19.7%): pure accuracy regression. The common thread: weight expansion without quality support.

Project	Type	Acc W14	Acc W13	GWt W14	GWt W13	Rate	Weight	Inter	Total	% of Δ
GCP-TT-Video appeal-GB-en-ALR-MNL	Appeal	69.9%	100.0%	2.25%	0.36%	−0.108	+0.266	−0.568	−0.410	25.1%
TT-Video-Analytics Appeal-MENA2-ar-T&S-CAS	Analytics Appeal	67.1%	73.5%	2.25%	0.51%	−0.033	−0.215	−0.112	−0.360	22.0%
TT-Video-General Recall General-MENA1-ku-CNX-ANK	General Recall	84.4%	96.8%	2.59%	2.61%	−0.322	−0.002	+0.002	−0.321	19.7%
TT-Video appeal-KE/TZ/UG-sw-TP-NBO	Appeal	69.4%	81.4%	1.12%	0.77%	−0.092	−0.016	−0.043	−0.151	9.2%
GCP-TT-Video-General Recall General-GB-en-TP-ALB	General Recall	58.9%	92.7%	0.06%	1.40%	−0.473	−0.091	+0.451	−0.113	6.9%
GCP-TT-Video appeal-IT-it-TP-BRV	Appeal	84.7%	92.3%	0.87%	1.60%	−0.122	−0.046	+0.055	−0.113	6.9%
TT-Video appeal-MENA1-other-TP-MAK	Appeal	63.4%	96.1%	0.22%	0.53%	−0.174	−0.032	+0.101	−0.104	6.4%
GCP-TT-Video-General Recall General-DE-de-TLS-LEJ	General Recall	75.4%	85.4%	1.00%	0.46%	−0.046	−0.003	−0.054	−0.102	6.3%
TT-Video appeal-MENA1-ar-CNX-IBD	Appeal	74.2%	78.9%	1.48%	1.17%	−0.056	−0.021	−0.014	−0.091	5.6%
TT-Video-General Recall General-MENA1-ar-TP-MAK	General Recall	N/A	100%	0.00%	0.53%	−0.534	−0.075	+0.534	−0.075	4.6%

Weight expansion is the recurring theme: 6 of 10 projects saw weight increase — when that expansion targets below-mean or declining-accuracy segments, the interaction effect compounds the damage. Only GR-MENA1-ku-CNX-ANK is a pure rate story (stable weight, −12.4pp accuracy drop).

▶ GCP-TT-Video appeal-GB-en-ALR-MNL — #1 contributor at 25.1%, here's the mechanism

GB MNL: the weight surge trap

This project's weight surged 6.25x (0.36% → 2.25%) while accuracy crashed from 100% → 69.9%. The interaction effect (−0.568pp) is the largest single component — weight grew dramatically while accuracy fell dramatically.

The weight effect is actually positive (+0.266pp) because the project was above the global mean in W13 (100% vs 85.9%). But the interaction overwhelms it: expanding into what became a low-accuracy segment is a compounding failure.

Key question: Was this a deliberate ramp-up of a previously small project? If so, quality controls didn't scale with volume.

▶ AA-MENA2-ar-T&S-CAS — weight quadrupled into a below-mean segment

MENA2 CAS: weight-driven damage

Weight grew from 0.51% → 2.25% (4.4x) while accuracy was already below the global mean (73.5%) and fell further to 67.1%. The weight effect alone (−0.215pp) is the largest component — this is a mix-shift problem, not primarily a rate problem.

All three effects are negative: rate (−0.033), weight (−0.215), interaction (−0.112). A triple headwind totaling −0.360pp (22.0% of decline).

Action: Validate whether this weight increase was intentional. Expanding a chronically below-mean segment without quality uplift compounds the global decline.

▶ TT-Video-General Recall General-MENA1-ku-CNX-ANK — pure rate collapse, no mix excuse

MENA1 ANK: classic accuracy regression

This is the cleanest rate-driven case in the top 10: weight barely moved (2.61% → 2.59%), so the rate effect (−0.322pp) almost entirely explains the −0.321pp total contribution.

Accuracy dropped from 96.8% → 84.4% (−12.4pp) — a steep fall from a high base. No mix-shift or weight excuses here; something changed in execution quality.

Action: Investigate what changed for Kurdish-language GR in MENA1 during W14 — policy update, new labeler cohort, or calibration drift.

Recommended actions

1EMEA Appeal quality RCA — focus on EN, MENA1, DE BPO sites where rate effect is the dominant driver.

2DE-LEJ site investigation — two projects with catastrophic accuracy (0.0% and N/A), possible vendor execution failure.

3Four projects went to N/A (zero weight) in W14 — confirm whether this is sampling shortfall or project suspension.

4SSA weight expansion — the only market with a triple-negative (rate + weight + interaction all negative); validate if the volume increase is intentional.

5EMEA GCP weight surge (0.85% → 2.98%) into a 74%-accuracy segment — check if this is a ramp-up or reallocation, and whether quality support is in place.

6APAC fuzzy rate jumped +0.49pp (largest increase) — investigate whether policy updates or new content types are driving borderline cases. Non-fuzzy quality is strong, but the fuzzy trend needs monitoring.

7AMS decline is entirely fuzzy-driven — non-fuzzy accuracy actually improved. Consider whether fuzzy calibration or policy clarification could recover the 0.19pp loss.

▶ Priority matrix — impact vs effort

Triage prioritization

P0 (immediate): DE-LEJ site — likely site-level outage/failure, accounts for 63.7% of decline from just two projects. Quick root cause identification could recover the most impact.

P1 (this week): Four N/A projects — verify if delivery gaps are fixable. If unplanned, restoring these could offset 167% of the decline (they overlap with rate-driven decline).

P1 (this week): MENA1 accuracy regression — 38.4% of decline, pure rate effect. Check if a policy update or labeler calibration issue occurred during W14.

P2 (track): SSA triple headwind and EMEA GCP weight surge — these are structural issues that need monitoring over W15–W16 to determine if they're transient or persistent.

P2 (track): APAC fuzzy rate surge (+0.49pp) — quality fundamentals are strong but the fuzzy trajectory needs monitoring. AMS fuzzy calibration is a quick-win candidate.

Global OMA W15

86.04%

▲ +1.92pp

from W14 84.12% · biggest weekly gain in 4 weeks

vs W14 baseline

recovered

▲ +1.92pp

erases the W14 −1.82pp drop

vs W13 baseline

85.94%

▲ +0.10pp

net 2-week change ≈ flat

Status

analysis pending

detailed shift-share not yet performed

▲

W15 saw the largest weekly OMA gain in the observed window: +1.92pp

Global OMA jumped 84.12% → 86.04%, fully reversing the −1.82pp W14 decline and ending two weeks 0.10pp above the W13 baseline. Detailed shift-share decomposition for W15 hasn't been performed yet — only the headline number is available.

📋

Detailed W15 vs W14 RCA not yet generated

The headline +1.92pp recovery is captured above, but per-market and per-policy decomposition for W15 hasn't been computed. Request the analysis if needed.

Global OMA W13

85.94%

baseline

starting point of the analyzed window

Status

analysis pending

no W12 data for shift-share comparison

📋

W13 report not yet generated

W13 (Mar 28 – Apr 3) is the baseline reference for W14 analysis. A standalone W13 vs W12 RCA would require Overall Moderation Accuracy data for W12, which is not currently available.