FIX 4 of 5: Label Independence Protocol

, ,

C1/C2 label circularity (program.md → labels)

This is the sharpest methodological critique and the reviewer is right. Deriving the “dimensional expansion threshold crossed” label from program.md is circular — the system’s own design file becomes the ground truth.

Cure:

  • Add a sentence explicitly acknowledging this risk and stating the mitigation: To avoid construct validity circularity, threshold labels will be verified by at least two independent human annotators blind to program.md, using task-grounded success criteria locked prior to data collection.”
  • Reference cryptographic commitment to label definitions before training begins
  • In the experimental plan, add a label independence protocol — annotators receive task outcomes only, not system logs

Strategy first

The reviewer’s exact complaint: The outcome label ‘dimensional expansion threshold crossed’ is derived from program.md, risking circularity and construct validity concerns — it is unclear how this label maps to observable ground truth independent of the system’s own design.”

This is the sharpest methodological critique in the entire review. The circularity is real: if the system’s own rule file defines what counts as success, and the classifier is trained on those labels, the RF is learning to replicate the system’s self-assessment — not an independent ground truth. The fix does not require changing the architecture. It requires adding an independent verification layer on top of the existing programmatic labelling, and stating it explicitly.

Three parts:

  • Part A — a targeted amendment to the existing label sentence in §7.3
  • Part B — a new Label Independence Protocol paragraph inserted immediately after
  • Part C — a one-sentence addition to §7.1 Epistemological Position to signal the discipline upfront

Where it goes

Primary location: Section 7.3 — “Random Forest Classification Framework”

Currently the label sentence reads:

“The outcome label — dimensional expansion threshold crossed — is programmatically derived from the score function in program.md. No manual override. Thresholds locked before training. Results logged immutably in the Amendment Record.”

Then the success threshold line follows:

“Success threshold: F1 ≥ 0.75 on held-out test set, ROC-AUC ≥ 0.80.”

Secondary location: Section 7.1 — “Epistemological Position”

One sentence appended to the final paragraph of §7.1.


The insertions (markup format)


PART A — §7.3: Amend the existing label sentence
BEFORE:
The outcome label — dimensional expansion threshold crossed 
— is programmatically derived from the score function in 
program.md. No manual override. Thresholds locked before 
training. Results logged immutably in the Amendment Record.

AFTER:
The outcome label — dimensional expansion threshold crossed 
— is programmatically derived from the score function in 
program.md as a first-pass signal. Thresholds locked before 
training. Results logged immutably in the Amendment Record. 
This programmatic derivation is necessary but not sufficient: 
label validity requires independent human verification, 
specified in the Label Independence Protocol below.

What changes: removesNo manual override” — which was the problem sentence — and replaces it with a forward reference to the protocol. The thresholds-locked sentence stays; that’s a strength, not a weakness.


PART B — §7.3: New Label Independence Protocol paragraph
[IMMEDIATELY AFTER THE AMENDED LABEL SENTENCE]

>>> INSERT HERE >>>

Label Independence Protocol. To prevent construct validity 
circularity — wherein the classifier learns to replicate 
the system's self-assessment rather than an independent 
ground truth — the following three-stage verification 
procedure applies to all C1 and C2 outcome labels before 
training data is finalised.

Stage 1 — Pre-commitment. Label definitions and threshold 
criteria are cryptographically committed (SHA-256 hash of 
the full scoring rubric) before any operational data is 
collected. The commitment hash is recorded on-chain 
(AgentAction event, Base mainnet) alongside the session 
log anchors. This ensures that label criteria cannot be 
adjusted post-hoc to improve classifier performance.

Stage 2 — Independent human adjudication. A panel of at 
least two human annotators — blind to program.md and to 
the system's programmatic label output — scores each 
session against task-grounded success criteria: was the 
meeting summary accurate? Were the action items correctly 
identified? Did the output meet the predefined rubric? 
Annotators receive: (a) the raw task specification, 
(b) the agent output, and (c) the four-point scoring 
rubric. They do not receive program.md, agent logs, 
or the programmatic label. Inter-rater agreement is 
computed using Cohen's kappa; sessions where kappa 
falls below 0.70 are excluded from training data and 
reported in full in the pre-training audit.

Stage 3 — Reconciliation and conflict resolution. Where 
the programmatic label (from program.md score function) 
and the human panel label agree, the label is accepted. 
Where they disagree, the human panel label supersedes 
the programmatic label, and the disagreement is logged 
as a construct validity signal — evidence of either 
program.md miscalibration or annotator error — and 
reported in the Phase 1 results regardless of direction. 
The reconciliation rate is itself a finding: high 
disagreement between programmatic and human labels 
constitutes evidence that the system's self-assessment 
is not a reliable proxy for task-grounded output quality, 
which is a publishable null result under the bias-toward-
understatement principle.

The Label Independence Protocol ensures that the 
classifier is trained on human-verified, task-grounded 
outcome labels that are independent of program.md — 
while retaining program.md's programmatic output as a 
secondary signal available for post-hoc comparison. 
This transforms a circularity risk into a calibration 
instrument: the gap between programmatic and human 
labels, where it exists, is itself data.

<<< END INSERT <

[EXISTING SUCCESS THRESHOLD LINE CONTINUES]
Success threshold: F1 ≥ 0.75 on held-out test set, 
ROC-AUC ≥ 0.80...

PART C — §7.1: One sentence appended to final paragraph
BEFORE (final sentence of §7.1):
The Random Forest classifier is not asked to detect 
intelligence. It is asked to detect four operationally 
defined conditions, each independently falsifiable.

AFTER:
The Random Forest classifier is not asked to detect 
intelligence. It is asked to detect four operationally 
defined conditions, each independently falsifiable. 
Outcome labels are subject to independent human 
adjudication under a pre-committed Label Independence 
Protocol (§7.3) to prevent construct validity circularity 
between the measurement instrument and the system under 
study.

One sentence. Signals the discipline at the epistemological position statement, before the reviewer reaches §7.3. Shows the paper is aware of the problem and has addressed it structurally, not defensively.


What this fixes

Reviewer complaintStatus after Fix 4
“Label derived from program.md risks circularity”Directly answered — human adjudication layer added
“Unclear how label maps to ground truth independent of system design”Stage 2 specifies exactly what annotators receive and don’t receive
“Construct validity concerns”Stage 3 reconciliation converts disagreement into publishable finding
Reviewer Q4: “How is label computed independently of program.md?”Fully answered — three-stage protocol with specific controls
Inter-rater reliability missingCohen’s kappa specified with 0.70 exclusion threshold

What is explicitly NOT changed

  • The RF framework structure — predictor variables, train/test split, F1/ROC-AUC thresholds — all untouched
  • The bias-toward-understatement principle — reinforced, not replaced
  • The on-chain anchoring of session logs — unchanged; Stage 1 adds a pre-commitment hash alongside existing anchors
  • program.md’s role — preserved as first-pass signal and calibration comparator, not discarded

Why Stage 3 is the most important part

The reviewer’s underlying concern is that DIE cannot be falsified if the system defines its own success. Stage 3 inverts this: disagreement between programmatic and human labels is explicitly framed as a publishable finding. This is the bias-toward-understatement principle applied to the labelling layer. A paper that can report “our system’s self-assessment disagreed with human judges at rate X” is more credible than one that only reports F1 scores. The protocol turns the circularity vulnerability into a calibration instrument — which is a stronger position than simply asserting independence.


Word count impact

Approximately +320 words. All contained within §7.3 and a one-sentence addition to §7.1.