🦉
Chuck
Got feedback?
Reach out to Chuck
Moderation Quality · Weekly RCA Report
W16 −0.04pp
W15 +1.92pp
W14 −1.82pp
W13

OMA held flat at −0.04pp — but underneath, the mix is highly turbulent

Shift-share decomposition of W16 (Apr 18–24) vs W15 (Apr 11–17). Global moved 86.04%→86.00%, yet ID and LATAM each contributed ~800% of Δ to the drag, offset by Adult Sexualized Behaviors and Tobacco recovering. The headline calm masks high single-policy and single-market volatility.

Global OMA W16
86.00%
▼ 0.04pp
from 86.04% · ≈ flat
ID · Wt 8.85%
86.81%
▼ 3.40pp
842% of global Δ
LATAM · Wt 9.24%
82.51%
▼ 3.03pp
806% of global Δ
A.S.B + Tobacco offsets
+0.77pp
▲ combined
−2189% offset
Overview −0.04pp
Methodology & headline
Markets 12 dragging
ID #1 at 842%
Top Policies 10+ severe
Violent Behaviors leads
Actions 7
Investigate the offsets
00

Methodology & headline summary

W16 (Apr 18–24) vs W15 (Apr 11–17). Each segment's contribution to OMA is decomposed into rate, weight, and interaction effects. Because the global Δ is tiny (−0.04pp), individual segment % of Δ figures can balloon — focus on absolute pp contributions to gauge true scale.

Rate effect = GWtW15 × (AccW16 − AccW15) — pure accuracy change at prior weight
Weight effect = (GWtW16 − GWtW15) × (AccW15 − Global AccW15) — mix shift relative to global mean
Interaction = (GWtW16 − GWtW15) × (AccW16 − AccW15) — joint change
The headline is misleading — the underlying mix is highly turbulent
86.04% → 86.00% looks like a non-event. But: Violent Behaviors fell −19.32pp, Personal Information - High Risk fell −31.01pp, Disparaging Religion fell −75.96pp. They were offset by equally severe gains: Adult Sexualized Behaviors recovering +3.87pp on heavy weight, Tobacco +2.16pp continuing its rebound, Invasive Cosmetic +22.73pp. Net ≈ 0.
Geographic drag concentrated in ID + LATAM = 1648% of global Δ
ID (−3.40pp accuracy, 842% of Δ) and LATAM (−3.03pp, 806% of Δ) each contributed more than 8× the global decline. SSA, PH, MENA1, BD, JP, TR all add 200%+ each. ID is now in its 3rd consecutive WoW decline: W13→W14 −3.16pp, W14→W15 +0.58pp, W15→W16 −3.40pp.
Adult Sexualized Behaviors + Tobacco delivered +0.77pp combined offset — what saved the headline
Adult Sexualized Behaviors (+3.87pp accuracy on 5.0% weight, contribution +0.43pp) and Tobacco & Nicotine (+2.16pp on 10.81% weight, contribution +0.33pp) together offset 2189% of the global decline. MENA2 recovered +6.84pp accuracy regionally (+0.30pp). Without these three, the headline would read closer to −1.5pp.
Why "% of Δ" looks extreme this week

Small denominator, large numerator

Global Δ = −0.04pp. When a segment contributes −0.30pp (a normal magnitude), it's ~750% of the global change. This is mathematically correct but visually scary.

The right interpretation: treat absolute pp contributions as the signal. Anything > 0.10pp is materially large in absolute terms — and the W16 table has 10+ such items on each side, indicating high underlying volatility.

If next week one of the offsets fails to repeat (e.g., Tobacco continues recovering but Adult Sexualized Behaviors regresses), the headline could swing 1–2pp easily. The current calm is fragile.

Data integrity flag — several policies report 0% accuracy

0%-accuracy policies need verification

Four policies show 0% accuracy in both W15 and W16 yet still contribute meaningfully to the global delta via weight changes:

  • Animal Abuse & Graphic Content: 0% → 0%, weight 0.13% → 0.39% (contribution −0.225pp)
  • Youth Physical Abuse, Assault & Neglect: 0% → 0%, weight 0.22% → 0.38%
  • Graphic Content: 0% → 0%, weight grew
  • Adult Sexual Abuse: 0% → 0%, weight 0.41% → 0.50%

A persistent 0% on a non-trivial sample is implausible as a true accuracy figure. Likely causes: data filter excluding all "approve" cases for these policies, sampling artifact, or definitional change. Verify before treating these as real signal.

01

By market — top contributors

12 markets dragging at 100%+ each — but 8 markets offset more than the entire decline
Drag side: ID 842%, LATAM 806%, SSA 440%, PH 398%, MENA1 347%, BD 294%, JP 252%, TR 241%. Offset side: MENA2 +6.84pp recovery, VN grew on improvement, BR +3.43pp, IT +5.37pp.
MarketAcc W15Acc W16Δ AccWt W15Wt W16RateWeightInterTotal% of Δ
ID90.22%86.81%−3.408.70%8.85%−0.296+0.006−0.005−0.295842.3%
LATAM85.54%82.51%−3.038.79%9.24%−0.266−0.002−0.014−0.282805.8%
SSA79.43%75.32%−4.113.37%3.52%−0.139−0.010−0.006−0.154439.8%
PH87.45%83.99%−3.454.15%3.95%−0.143−0.003+0.007−0.139397.7%
MENA186.79%84.98%−1.815.60%7.50%−0.101+0.014−0.034−0.122347.2%
BD86.85%84.98%−1.875.28%5.68%−0.099+0.003−0.007−0.103294.1%
JP92.01%90.48%−1.532.28%1.07%−0.035−0.072+0.018−0.088252.1%
TR88.70%84.36%−4.341.97%1.92%−0.085−0.001+0.002−0.084241.0%
ES89.25%82.39%−6.871.08%1.00%−0.074−0.003+0.006−0.071203.6%
MX82.78%81.87%−0.914.65%5.26%−0.042−0.020−0.006−0.068193.8%
Top-10 negative subtotal−1.4084017.4%
MENA277.79%84.63%+6.844.39%4.00%+0.300+0.032−0.027+0.305−870.2%
VN90.16%91.69%+1.536.84%7.73%+0.105+0.037+0.014+0.155−443.5%
BR83.73%87.16%+3.434.46%4.75%+0.153−0.007+0.005+0.150−429.1%
IT86.41%91.78%+5.372.47%2.39%+0.133+0.000−0.005+0.128−364.9%
MY83.36%90.47%+7.111.67%1.52%+0.119−0.004−0.007+0.108−309.6%
Top-5 positive subtotal+0.846−2417.3%
JP weight collapse (2.28% → 1.07%, −1.21pp) is the largest single mix-shift event among draggers. Despite the small accuracy decline (−1.53pp), the weight effect (−0.072pp) is unusually large because JP W15 accuracy (92%) was well above the global mean — shrinking it removes a high-quality contributor from the mix.
ID #1 dragger — 4 weeks of consecutive declines

ID: a recurring pattern, not a one-off

ID OMA accuracy fell from 90.22% → 86.81% in W16 (−3.40pp). This is the third significant decline in four weeks: W13 92.79% → W14 89.63% → W15 90.22% → W16 86.81%. Cumulative drop: −5.97pp from W13 baseline.

The market is also gaining global share (8.70% → 8.85%) while accuracy worsens — the interaction effect is small but negative. Suggests either Indonesia-specific moderation quality is degrading, or the additional volume is concentrated in harder-to-judge content.

Action: Request structured Indonesia retrospective. The trend is now clear enough to need a dedicated investigation.

LATAM #2 dragger — what's behind the −3.03pp accuracy drop

LATAM: pure rate-effect dominance

LATAM accuracy fell from 85.54% → 82.51%, a 3.03pp drop. Weight grew slightly (8.79% → 9.24%) which marginally amplified damage via interaction (−0.014pp).

The rate component (−0.266pp) is by far the largest driver. Investigate whether a regional policy change, language model update, or sampling shift hit the LATAM portfolio specifically in W16.

MENA2 #1 offset — +6.84pp recovery, is it durable?

MENA2: bounce-back from a chronic underperformer

MENA2 accuracy jumped 77.79% → 84.63% (+6.84pp). Weight contracted slightly (4.39% → 4.00%), so this is overwhelmingly a rate story.

Looking back, MENA2 has been a problem region — this single-week recovery is the largest market gain in the dataset. Whether it's durable depends on whether the W14–W15 issue was a one-off (sample anomaly, transient labeling problem) or whether deeper calibration work lifted the floor.

Confirm with the regional team whether structural changes were made.

02

By policy title — top contributors

Multiple policies dropped 10–35pp accuracy this week
Severe single-policy drops include Disparaging Religion (−75.96pp, 89.07%→13.11%), Light Body Exposure (−36.83pp), Personal Information - High Risk (−31.01pp), NSA Exceptions - Mature (−22.85pp), Suicide & NSSI (−21.99pp), Violent Behaviors (−19.32pp). Even with small weights, these aggregate fast.
Tobacco & Nicotine — 2nd-heaviest policy (10.81%) acted as a stabilizer for the second straight week
Tobacco accuracy continued recovering: 79.04% → 81.20% (+2.16pp) and its share contracted 12.23% → 10.81% (−1.42pp). Because Tobacco accuracy is well below the global mean (−7.0pp from 86.04%), shrinking its weight is a strong net positive. Combined contribution: +0.334pp (offsetting 952% of the global Δ).
Tobacco & Nicotine — deep dive into a sustained recovery

Tobacco's outsized impact

At 10.81% of W16 sample weight, Tobacco & Nicotine is the 2nd-largest single policy (after Youth Regulated Goods at 12.29%). Its accuracy moves the global needle directly.

Multi-week trajectory: clear recovery from W14 trough

  • W13: 84.66% acc, 13.23% wt — recent peak
  • W14: 76.07% acc, 11.06% wt — −8.59pp single-week collapse
  • W15: 79.04% acc, 12.23% wt — partial recovery (+2.97pp)
  • W16: 81.20% acc, 10.81% wt — continued recovery (+2.16pp)

Tobacco quality has rebounded ~5.13pp from the W14 trough, but is still 3.46pp below its W13 baseline. The trajectory is clearly positive.

Shift-share decomposition (W16 vs W15)

  • Rate effect: +0.264pp — accuracy gain at prior weight
  • Weight effect: +0.099pp — shrinking a below-mean segment helps
  • Interaction: −0.031pp — small, accuracy ↑ while weight ↓
  • Total: +0.334pp (≈ −952% of the global Δ)

What to watch

Sample volume: 1,878 → 1,537 cases (−18%). Some of the weight contraction may reflect a sampling change. Verify the methodology hasn't changed.

Below-mean accuracy persistence: at 81.20%, Tobacco is still 4.80pp below the global mean. If volume rebounds before quality recovers further, the helpful weight-effect direction will reverse — Tobacco could flip back to a major drag.

Action: Lock in the recovery — confirm whether the W14 trough was an isolated event and whether the 3-week rebound has structural support, not just regression-to-mean.

PolicyAcc W15Acc W16Δ AccWt W15Wt W16RateWeightInterTotal% of Δ
Violent Behaviors76.78%57.46%−19.321.47%1.77%−0.284−0.029−0.058−0.3711059.4%
Gambling - Depiction and Promotion69.68%59.89%−9.791.51%2.07%−0.148−0.092−0.054−0.293836.9%
Dangerous Trends - Serious Harm68.09%63.14%−4.954.83%4.95%−0.239−0.022−0.006−0.266758.7%
Personal Information - High Risk84.12%53.12%−31.010.67%0.71%−0.208−0.001−0.011−0.220627.4%
Youth Non-Sexualized Nudity76.77%74.61%−2.164.86%5.60%−0.105−0.069−0.016−0.189540.2%
Youth Body Exposure - Light (4-17)40.38%37.08%−3.300.67%0.98%−0.022−0.146−0.010−0.178507.6%
Youth Regulated Goods and Services73.69%72.65%−1.0412.10%12.29%−0.126−0.023−0.002−0.151430.1%
Light Body Exposure70.00%33.17%−36.830.08%0.30%−0.029−0.036−0.082−0.147419.9%
High Risk Driving64.91%60.73%−4.182.33%2.53%−0.097−0.041−0.008−0.147419.7%
Regulated Goods - Marketing/Trade47.96%48.53%+0.571.44%1.80%+0.008−0.135+0.002−0.129367.7%
Top-10 negative subtotal−2.0905967.7%
Adult Sexualized Behaviors54.88%58.75%+3.875.77%5.00%+0.224+0.239−0.030+0.433−1236.1%
Tobacco and Nicotine ★ 2nd heaviest policy79.04%81.20%+2.1612.23%10.81%+0.264+0.099−0.031+0.334−952.5%
Invasive Cosmetic Procedures65.14%87.86%+22.731.30%2.26%+0.295−0.201+0.219+0.313−894.6%
Combat sports, Extreme Sports & Stunts75.02%82.54%+7.514.04%4.28%+0.304−0.026+0.018+0.296−844.0%
Moderate Bullying48.14%50.83%+2.692.26%1.62%+0.061+0.247−0.017+0.290−826.3%
Top-5 positive subtotal+1.666−4753.5%
Severe single-policy regressions: Disparaging Religion 89.07%→13.11% (−75.96pp); Suicide & NSSI 57.37%→35.38% (−21.99pp); NSA Exceptions - Mature 53.62%→30.76% (−22.85pp); Adult Sexual Solicitation 57.81%→46.33% (−11.47pp). These weren't in the top-10 by total contribution because their weights are tiny (<1%), but the rate magnitudes warrant individual investigation. Several policies report 0% accuracy in both weeks (Animal Abuse, Youth Physical Abuse, Graphic Content, Adult Sexual Abuse) — likely a data integrity issue, not real signal.
Violent Behaviors #1 — −19.32pp drop on growing weight

Violent Behaviors: triple-negative, all three effects against

Accuracy collapsed 76.78% → 57.46% (−19.32pp). Weight grew (1.47% → 1.77%), so the additional volume entered a now-failing segment — interaction effect (−0.058pp) compounds the damage.

This is one of the largest reputational-risk policy categories. A 19pp accuracy drop combined with growing volume is a serious signal — escalate immediately.

Personal Information - High Risk — −31pp single week

Personal Info High Risk: catastrophic single-week drop

Accuracy fell 84.12% → 53.12% (−31.01pp) on stable weight (~0.69%). The pure rate effect (−0.208pp) entirely explains this row's contribution.

A 31pp drop on a privacy-related, high-stakes policy is alarming. Possible drivers: policy interpretation change, new content vector (e.g., new types of doxxing patterns), or model/labeler retraining gone wrong. Investigate before W17.

Adult Sexualized Behaviors — +0.43pp top offset, what drove it

A.S.B: the largest single offset

Adult Sexualized Behaviors recovered 54.88% → 58.75% (+3.87pp). Weight contracted 5.77% → 5.00% (−0.77pp). Both effects are favorable: rate (+0.224pp) and weight (+0.239pp) — shrinking a below-mean segment helps.

This single policy contributed +0.433pp — by itself, more than 12× the global Δ in the offsetting direction. Worth understanding what drove the accuracy jump (calibration, content shift, sampling) since A.S.B is a chronic problem area.

Disparaging Religion — 89.07% → 13.11% (−75.96pp)

Disparaging Religion: most severe rate drop

This policy collapsed by 75.96pp on a tiny sample weight (~0.08–0.13%). Global impact is "only" −0.099pp (246%), but the rate magnitude is unprecedented.

Almost certainly a sample/policy/labeling artifact — a 76pp single-week swing is implausible as a true accuracy change. Verify the W16 sample is representative; if it is, escalate as a critical operational failure.

Recommended actions
1Don't celebrate the −0.04pp headline. The mix is unstable: 10+ policies dropped 10–35pp this week, balanced by equally large gainers. If next week one of the gainers fails to repeat, headline could swing 1–2pp.
2Investigate Violent Behaviors (−19.32pp accuracy, weight growing) — triple-negative on a high-reputational-risk category. Escalate to policy ops.
3Personal Information - High Risk dropped 31pp — privacy-sensitive, suspicious magnitude. Audit sample composition and labeler agreement before W17.
4ID + LATAM combined drag of 1648% — both regions saw 3+pp accuracy drops. Region-level RCA needed to determine if this is a shared cause (model update, content shift) or independent.
5ID is on a 3-of-4-week declining trend (W13 92.79% → W16 86.81%, cumulative −5.97pp). This is no longer a single-week event — request a structured Indonesia retrospective.
6Verify the offsets are real, not artifacts. Disparaging Religion (−76pp), Invasive Cosmetic (+23pp), MENA2 region (+6.84pp), Adult Fetish & Kinks (+32pp) — these magnitudes invite sampling/labeling scrutiny before being trusted as signal.
7Data integrity: 4 policies report 0% accuracy in both weeks (Animal Abuse & Graphic Content, Youth Physical Abuse, Graphic Content, Adult Sexual Abuse) yet still drag the global via weight changes. Likely a data filter or definitional issue — fix before treating as RCA signal.
Priority matrix — what to triage first

Triage prioritization

P0 (immediate, integrity risk): Personal Information - High Risk (−31pp), Violent Behaviors (−19.32pp). Both are reputational categories with material accuracy regression on growing or stable weight.

P0 (data integrity): Verify Disparaging Religion (−76pp), 0%-accuracy policies, and other extreme single-policy swings are not sample/labeling artifacts. Swings of this size are more likely measurement issues than real changes.

P1 (regional): ID + LATAM joint investigation. If the cause is shared (e.g., a regional model rollout), one fix solves both. Otherwise treat as independent.

P1 (trend): ID 4-week decline pattern — even if W16 isolated event resolves, the trend itself warrants attention.

P2 (lock in gains): Tobacco & Nicotine recovery (3 weeks now positive) and Adult Sexualized Behaviors offset — confirm structural drivers, not just regression-to-mean.

P3 (signal hygiene): Replace single-week % of Δ as the primary metric for non-trivial WoW reports — when global Δ < 0.1pp, use absolute pp contributions instead.

Global W14
84.28%
▼ 1.63pp
from 85.91% · 100% of decline
EMEA · Wt 32.4%
79.65%
▼ 6.48pp
128% of global decline
APAC · Wt 47.8%
88.61%
▲ 0.92pp
−31% offset the decline
AMS · Wt 19.7%
81.37%
▼ 0.12pp
1% of global decline
Overview −1.63pp
Methodology, decomposition & fuzzy
Hub × Type 129%
EMEA Appeal alone = 50.9%
EMEA Markets 110%
MENA1 leads at 38.4%
Top Projects TOP 10
GB-MNL #1 at 25.1%
Actions 7
P0–P2 prioritized items
00

Methodology & summary

W14 (Apr 4–10) vs W13 (Mar 28–Apr 3). Each segment's total contribution is decomposed into three additive components. Positive % of Δ = contributed to the decline; negative = offset.

Rate effect = GWtW13 × (AccW14 − AccW13) — pure accuracy change at prior weight
Weight effect = (GWtW14 − GWtW13) × (AccW13 − Global AccW13) — mix shift relative to global mean
Interaction = (GWtW14 − GWtW13) × (AccW14 − AccW13) — joint change
−2.64pp
Total rate effect
161% of decline
+0.76pp
Total weight effect
Offset 47%
+0.24pp
Total interaction
Offset 15%
Quality degraded across the board — here's why this matters
The rate effect (−2.64pp) alone would have caused a 3.4pp decline if the mix hadn't shifted favorably. The actual −1.63pp is the best-case outcome given how much accuracy fell — saved only by favorable weight rebalancing.
APAC's growth was the safety net — here's how
APAC (88.6% accuracy, above global mean) grew from 47.2% → 47.8% of mix. This single shift absorbed nearly half the damage. Without it, the headline would read −3.1pp instead of −1.63pp.
How to read this decomposition

Interpreting the three effects

Rate effect (161%) tells us accuracy degradation within segments — holding mix constant — more than fully explains the decline. This is the "quality got worse" signal.

Weight effect (offset 47%) means the mix actually shifted favorably: segments with above-average accuracy gained share. Without this, the decline would have been ~3.4pp instead of 1.63pp.

Interaction (offset 15%) captures the joint effect — segments that lost accuracy also tended to shrink in weight, providing a small additional buffer.

The sum: −2.64 + 0.76 + 0.24 = −1.63pp, matching the observed global decline exactly.

01

Fuzzy rate impact

−0.36pp
Fuzzy rate increase
21.8% of total decline
−1.28pp
Non-fuzzy accuracy decline
78.2% of total decline
Fuzzy rate rose +0.35pp — but three hubs tell completely different stories
AMS: decline is 100% fuzzy — real quality held steady. APAC: powered through the biggest fuzzy headwind (+0.49pp) with +1.41pp genuine improvement. EMEA: 96% of the −6.48pp drop is real accuracy errors, not borderline ambiguity.
HubFR W14FR W13Δ FRAcc Δ totalFuzzy explainsNon-fuzzy ΔVerdict
AMS1.76%1.57%+0.19pp−0.12pp −0.19pp+0.06pp Entire decline is fuzzy-driven. Non-fuzzy accuracy actually improved.
APAC2.08%1.59%+0.49pp+0.92pp −0.49pp+1.41pp Fuzzy headwind absorbed — non-fuzzy quality improved strongly (+1.41pp).
EMEA3.12%2.86%+0.26pp−6.48pp −0.26pp−6.21pp 96% of EMEA's decline is non-fuzzy. Fuzzy is a minor factor here.
Global2.35%2.00%+0.36pp−1.63pp −0.36pp−1.28pp Fuzzy = 22%, non-fuzzy = 78%
Key insight: The three hubs tell very different stories. AMS's small decline is 100% fuzzy — actual quality held steady. APAC powered through a large fuzzy increase with even larger genuine improvement. EMEA's massive drop is overwhelmingly real accuracy errors — fuzzy rate barely moved. This confirms EMEA's issue is fundamentally about moderation quality, not borderline-case ambiguity.
AMS — decline is 100% fuzzy-driven

AMS: a fuzzy story, not a quality story

AMS accuracy fell just −0.12pp, and the entire decline is explained by the fuzzy rate increase (+0.19pp). Once fuzzy is stripped out, AMS non-fuzzy accuracy actually improved by +0.06pp.

This means AMS's labeling quality is holding steady or improving — the headline number is being dragged by borderline cases being reclassified or new ambiguous content types entering the pipeline.

Action: Consider fuzzy calibration or policy clarification for the specific content types driving the 0.19pp fuzzy increase. This is a recoverable loss.

APAC — strong quality masked by fuzzy headwind

APAC: quality is better than the headline suggests

APAC's reported accuracy improved +0.92pp, but the underlying non-fuzzy improvement is actually +1.41pp — being partially masked by a +0.49pp fuzzy rate increase (the largest of any hub).

APAC absorbed the biggest fuzzy headwind and still delivered the best headline improvement. However, the fuzzy trend (+0.49pp WoW) needs monitoring — if it continues, it will eventually overwhelm the quality gains.

Action: Investigate whether policy updates or new content types in APAC are driving the fuzzy surge. The quality fundamentals are strong, but the fuzzy trajectory is concerning.

EMEA — fuzzy is a rounding error; the problem is real

EMEA: genuine moderation quality crisis

EMEA's fuzzy rate only increased +0.26pp, explaining just 4% of its massive −6.48pp accuracy decline. The remaining −6.21pp is pure non-fuzzy accuracy degradation.

This definitively rules out "borderline cases" as an explanation for EMEA's performance. The problem is fundamentally about labeler accuracy, policy interpretation, or operational execution — not content ambiguity.

EMEA also has the highest absolute fuzzy rate (3.12% vs 2.08% APAC, 1.76% AMS), suggesting a structural baseline of ambiguity in its content mix, but the week-over-week change is small.

01

Hub × project type

EMEA's three project types account for 129% of the decline
EMEA Appeal alone is 50.9%: accuracy collapsed 82.7% → 76.6% (−6.06pp) while still carrying 15.6% of global weight. APAC General Recall is the largest single offset (−30.6%), improving to 90.1% while gaining share.
HubTypeAcc W14Acc W13Δ AccGWt W14GWt W13RateWeightInterTotal% of Δ
EMEAAppeal76.6%82.7%−6.0615.6%19.1%−1.156+0.113+0.212−0.83150.9%
EMEAGeneral Recall84.2%91.6%−7.3712.1%10.0%−0.734+0.119−0.156−0.77147.1%
EMEAAnalytics Appeal77.7%89.7%−12.014.7%3.2%−0.387+0.055−0.174−0.50530.9%
AMSGeneral Recall84.4%85.8%−1.3714.3%9.5%−0.130−0.005−0.066−0.20012.2%
APACAppeal85.4%85.7%−0.2715.9%18.8%−0.052+0.006+0.008−0.0372.3%
Negative subtotal−2.345143.4%
AMSAppeal72.9%78.7%−5.865.0%10.5%−0.614+0.394+0.320+0.100−6.1%
AMSAnalytics Appeal79.0%62.7%+16.350.5%0.6%+0.102+0.038−0.027+0.113−6.9%
APACGeneral Recall90.1%88.6%+1.4927.5%24.1%+0.359+0.091+0.051+0.501−30.6%
Positive subtotal+0.714−43.7%
AMS Appeal — accuracy did fall (rate = −0.61pp), but its accuracy is well below the global mean, so the weight halving from 10.5% → 5.0% was net positive for the global number (+0.39pp weight effect), flipping total contribution to +0.10pp.
EMEA Appeal deep dive — why −6.06pp accuracy drop?

EMEA Appeal: rate effect dominance

The −1.156pp rate effect is the single largest driver in this decomposition. EMEA Appeal dropped from 82.7% to 76.6%, a −6.06pp swing, while still carrying 15.6% global weight.

The weight did shrink (19.1% → 15.6%), which partially offset the damage (+0.113pp weight effect, +0.212pp interaction), but the sheer magnitude of the accuracy collapse overwhelms both offsets.

Key question: Is this driven by specific BPO sites, policy updates, or labeler calibration drift? See the "Top Projects" tab for project-level decomposition.

APAC General Recall — why it's the biggest offset

APAC GR: the stabilizer

APAC General Recall improved from 88.6% to 90.1% (+1.49pp) while also gaining weight (24.1% → 27.5%). This is the ideal scenario: an above-average segment both improves and grows.

All three effects are positive: rate (+0.359pp), weight (+0.091pp), interaction (+0.051pp), summing to +0.501pp — the single largest offset at −30.6% of the decline.

02

EMEA market breakdown

5 markets drive 110% of the global decline — almost entirely rate-driven
MENA1 + EN + SSA + DE + MENA2. The damage is concentrated: MENA1 alone is 38.4%. Only SSA compounds all three effects — weight grew into a below-mean, declining segment.
MarketAcc W14Acc W13Δ AccGWt W14GWt W13RateWeightInterTotal% of Δ
MENA180.5%90.4%−9.896.26%6.46%−0.639−0.009+0.020−0.62838.4%
EN (GB)78.8%88.5%−9.673.56%4.18%−0.404−0.016+0.061−0.36022.0%
SSA75.9%84.2%−8.223.65%2.46%−0.202−0.021−0.099−0.32119.7%
DE77.4%86.8%−9.422.93%2.90%−0.273+0.000−0.003−0.27616.9%
MENA275.8%80.9%−5.084.28%4.37%−0.222+0.005+0.005−0.21313.0%
IT84.2%93.2%−9.072.43%2.27%−0.205+0.012−0.015−0.20812.7%
IL67.2%83.8%−16.550.36%0.43%−0.071+0.001+0.011−0.0593.6%
UA74.3%77.3%−3.041.08%1.04%−0.032−0.003−0.001−0.0362.2%
SSA is the only top market where all three effects are negative — weight expanded (2.46%→3.65%), accuracy sits below the global mean, and accuracy also fell. A triple headwind worth investigating.
MENA1 deep dive — largest market contributor at 38.4%

MENA1: pure rate problem

MENA1 dropped from 90.4% to 80.5% (−9.89pp) while maintaining roughly stable weight (6.46% → 6.26%). The rate effect (−0.639pp) almost entirely explains its contribution.

This is a nearly pure accuracy regression — no confounding mix shifts. The investigation should focus on what changed in MENA1 labeling quality, policy interpretation, or task distribution during W14.

SSA triple headwind — all three effects negative

SSA: compounding failure mode

SSA is unique among all segments: rate, weight, and interaction are all negative.

Rate (−0.202pp): accuracy fell from 84.2% to 75.9%, a −8.22pp drop.

Weight (−0.021pp): SSA's weight grew from 2.46% to 3.65%, but since SSA accuracy (84.2%) was below the W13 global mean (85.9%), this expansion hurts.

Interaction (−0.099pp): the weight grew AND accuracy fell simultaneously — the worst combination.

Key question: Was the SSA weight increase intentional (ramp-up)? If so, quality support did not scale with volume.

IL — steepest single-market accuracy drop (−16.55pp)

IL: low weight limits global impact

IL has the most dramatic accuracy decline of any market (83.8% → 67.2%, −16.55pp), but its small weight (0.36%) limits global impact to just −0.059pp (3.6% of decline).

Still worth flagging: a 16.5pp drop likely indicates a systemic issue — new policy, labeler turnover, or task type change — that could worsen if IL weight increases.

03

EMEA — top 10 individual projects (shift-share)

Top 3 projects drive 67% of the global decline
GB-ALR-MNL (25.1%): weight surged 6x into crashing accuracy. MENA2-CAS (22.0%): weight quadrupled into a chronically below-mean segment. MENA1-ANK (19.7%): pure accuracy regression. The common thread: weight expansion without quality support.
ProjectTypeAcc W14Acc W13GWt W14GWt W13RateWeightInterTotal% of Δ
GCP-TT-Video appeal-GB-en-ALR-MNLAppeal69.9%100.0%2.25%0.36%−0.108+0.266−0.568−0.41025.1%
TT-Video-Analytics Appeal-MENA2-ar-T&S-CASAnalytics Appeal67.1%73.5%2.25%0.51%−0.033−0.215−0.112−0.36022.0%
TT-Video-General Recall General-MENA1-ku-CNX-ANKGeneral Recall84.4%96.8%2.59%2.61%−0.322−0.002+0.002−0.32119.7%
TT-Video appeal-KE/TZ/UG-sw-TP-NBOAppeal69.4%81.4%1.12%0.77%−0.092−0.016−0.043−0.1519.2%
GCP-TT-Video-General Recall General-GB-en-TP-ALBGeneral Recall58.9%92.7%0.06%1.40%−0.473−0.091+0.451−0.1136.9%
GCP-TT-Video appeal-IT-it-TP-BRVAppeal84.7%92.3%0.87%1.60%−0.122−0.046+0.055−0.1136.9%
TT-Video appeal-MENA1-other-TP-MAKAppeal63.4%96.1%0.22%0.53%−0.174−0.032+0.101−0.1046.4%
GCP-TT-Video-General Recall General-DE-de-TLS-LEJGeneral Recall75.4%85.4%1.00%0.46%−0.046−0.003−0.054−0.1026.3%
TT-Video appeal-MENA1-ar-CNX-IBDAppeal74.2%78.9%1.48%1.17%−0.056−0.021−0.014−0.0915.6%
TT-Video-General Recall General-MENA1-ar-TP-MAKGeneral RecallN/A100%0.00%0.53%−0.534−0.075+0.534−0.0754.6%
Weight expansion is the recurring theme: 6 of 10 projects saw weight increase — when that expansion targets below-mean or declining-accuracy segments, the interaction effect compounds the damage. Only GR-MENA1-ku-CNX-ANK is a pure rate story (stable weight, −12.4pp accuracy drop).
GCP-TT-Video appeal-GB-en-ALR-MNL — #1 contributor at 25.1%, here's the mechanism

GB MNL: the weight surge trap

This project's weight surged 6.25x (0.36% → 2.25%) while accuracy crashed from 100% → 69.9%. The interaction effect (−0.568pp) is the largest single component — weight grew dramatically while accuracy fell dramatically.

The weight effect is actually positive (+0.266pp) because the project was above the global mean in W13 (100% vs 85.9%). But the interaction overwhelms it: expanding into what became a low-accuracy segment is a compounding failure.

Key question: Was this a deliberate ramp-up of a previously small project? If so, quality controls didn't scale with volume.

AA-MENA2-ar-T&S-CAS — weight quadrupled into a below-mean segment

MENA2 CAS: weight-driven damage

Weight grew from 0.51% → 2.25% (4.4x) while accuracy was already below the global mean (73.5%) and fell further to 67.1%. The weight effect alone (−0.215pp) is the largest component — this is a mix-shift problem, not primarily a rate problem.

All three effects are negative: rate (−0.033), weight (−0.215), interaction (−0.112). A triple headwind totaling −0.360pp (22.0% of decline).

Action: Validate whether this weight increase was intentional. Expanding a chronically below-mean segment without quality uplift compounds the global decline.

TT-Video-General Recall General-MENA1-ku-CNX-ANK — pure rate collapse, no mix excuse

MENA1 ANK: classic accuracy regression

This is the cleanest rate-driven case in the top 10: weight barely moved (2.61% → 2.59%), so the rate effect (−0.322pp) almost entirely explains the −0.321pp total contribution.

Accuracy dropped from 96.8% → 84.4% (−12.4pp) — a steep fall from a high base. No mix-shift or weight excuses here; something changed in execution quality.

Action: Investigate what changed for Kurdish-language GR in MENA1 during W14 — policy update, new labeler cohort, or calibration drift.

Recommended actions
1EMEA Appeal quality RCA — focus on EN, MENA1, DE BPO sites where rate effect is the dominant driver.
2DE-LEJ site investigation — two projects with catastrophic accuracy (0.0% and N/A), possible vendor execution failure.
3Four projects went to N/A (zero weight) in W14 — confirm whether this is sampling shortfall or project suspension.
4SSA weight expansion — the only market with a triple-negative (rate + weight + interaction all negative); validate if the volume increase is intentional.
5EMEA GCP weight surge (0.85% → 2.98%) into a 74%-accuracy segment — check if this is a ramp-up or reallocation, and whether quality support is in place.
6APAC fuzzy rate jumped +0.49pp (largest increase) — investigate whether policy updates or new content types are driving borderline cases. Non-fuzzy quality is strong, but the fuzzy trend needs monitoring.
7AMS decline is entirely fuzzy-driven — non-fuzzy accuracy actually improved. Consider whether fuzzy calibration or policy clarification could recover the 0.19pp loss.
Priority matrix — impact vs effort

Triage prioritization

P0 (immediate): DE-LEJ site — likely site-level outage/failure, accounts for 63.7% of decline from just two projects. Quick root cause identification could recover the most impact.

P1 (this week): Four N/A projects — verify if delivery gaps are fixable. If unplanned, restoring these could offset 167% of the decline (they overlap with rate-driven decline).

P1 (this week): MENA1 accuracy regression — 38.4% of decline, pure rate effect. Check if a policy update or labeler calibration issue occurred during W14.

P2 (track): SSA triple headwind and EMEA GCP weight surge — these are structural issues that need monitoring over W15–W16 to determine if they're transient or persistent.

P2 (track): APAC fuzzy rate surge (+0.49pp) — quality fundamentals are strong but the fuzzy trajectory needs monitoring. AMS fuzzy calibration is a quick-win candidate.

Global OMA W15
86.04%
▲ +1.92pp
from W14 84.12% · biggest weekly gain in 4 weeks
vs W14 baseline
recovered
▲ +1.92pp
erases the W14 −1.82pp drop
vs W13 baseline
85.94%
▲ +0.10pp
net 2-week change ≈ flat
Status
analysis pending
detailed shift-share not yet performed
W15 saw the largest weekly OMA gain in the observed window: +1.92pp
Global OMA jumped 84.12% → 86.04%, fully reversing the −1.82pp W14 decline and ending two weeks 0.10pp above the W13 baseline. Detailed shift-share decomposition for W15 hasn't been performed yet — only the headline number is available.
📋

Detailed W15 vs W14 RCA not yet generated

The headline +1.92pp recovery is captured above, but per-market and per-policy decomposition for W15 hasn't been computed. Request the analysis if needed.

Global OMA W13
85.94%
baseline
starting point of the analyzed window
Status
analysis pending
no W12 data for shift-share comparison
📋

W13 report not yet generated

W13 (Mar 28 – Apr 3) is the baseline reference for W14 analysis. A standalone W13 vs W12 RCA would require Overall Moderation Accuracy data for W12, which is not currently available.