Back to Methods

Incident Investigation Evaluation

Review the incident report scenario used to test AI models on their ability to conduct thorough investigations, identify root causes, and recommend corrective actions.

Incident Scenario

AI models are provided with the following incident details, witness statements, and medical notes. They must analyze this information and produce a comprehensive investigation report.

Scenario Data

Date: October 8, 2025

Shift: 1st

Location: Line 7, Capper Station

Employee Involved: Two associates

Injury Type: Pinch injury to the middle and pinky fingers

Severity: Minor soreness reported; evaluated by ERT

Event Details: During setup of the capper on Line 7, one associate was lowering the capper belts while another positioned a bottle. The belts activated unexpectedly, pinching the operator's fingers.

Witness Statements

Line Lead: Maria

"I was lowering the belts and didn't realize Jordan's hand was under the guard. We usually talk through this step, but I thought he was clear. The control button has been acting up for weeks. We've had two work orders on that capper in the last month. The control button sticks sometimes, but it wasn't flagged as urgent. I didn't know it was still malfunctioning."

Setup Technician: Tom

"I saw Jordan place the bottle under the capper while Shane was lowering the belts. The machine didn't stop like it usually does. Jordan pulled his hand back quickly and looked like he was in pain. We hit the emergency stop right away."

Injured Employee: Jordan

"I was helping set up the capper on Line 7. Maria was lowering the belts while I was positioning a bottle underneath. I placed my hand under the guard to hold the bottle steady, which we've done before when the machine acts up. The control button didn't respond like it should, and the belts came down faster than expected. My middle and pinky fingers got pinched. It hurt right away, and I pulled my hand out. Maria hit the emergency stop. I've had soreness since, but no major damage. I didn't think the button was still malfunctioning — I figured it had been fixed."

Medical Notes (ERT Evaluation)

Employee Name: Jordan

Initial Symptoms: Soreness and swelling in middle and pinky fingers

Visible Injury: No open wound; mild bruising

Range of Motion: Limited flexion in middle finger

Treatment Administered: Ice pack, compression wrap

Disposition: No ambulance required; advised follow-up with occupational health

ERT Responder: Jamie

Time of Evaluation: 09:50 AM

What Models Must Analyze

AI models are required to:

  • Review all incident details, witness statements, and medical notes
  • Identify immediate and contributing causes of the incident
  • Apply root cause analysis methodologies (5 Whys, Fishbone, TapRooT)
  • Determine underlying systemic issues and organizational factors
  • Recommend specific, actionable corrective actions
  • Prioritize recommendations based on risk and feasibility
  • Produce a clear, professional investigation report

Test Criteria

Certified safety professionals evaluate model responses using eight criteria, each scored on a 1-5 scale:

1. Factual Accuracy

Are all details (dates, names, equipment, injury type) correct and realistic?

1 = Many errors | 3 = Mostly accurate | 5 = Fully accurate

2. Completeness

Does the report include all essential sections and relevant details?

1 = Major gaps | 3 = Some omissions | 5 = Fully complete

3. Clarity and Readability

Is the report easy to read, well-structured, and professionally written?

1 = Confusing | 3 = Mixed clarity | 5 = Clear and polished

4. Judgment and Insight

Does the report show thoughtful analysis of causes and context?

1 = No insight | 3 = Basic reasoning | 5 = Strong contextual understanding

5. Tone and Objectivity

Is the tone neutral, professional, and free from bias or assumptions?

1 = Biased or emotional | 3 = Mixed tone | 5 = Fully objective

6. Consistency and Format

Is the formatting consistent and aligned with reporting standards?

1 = Disorganized | 3 = Some inconsistencies | 5 = Consistent and standardized

7. Corrective Action Quality

Are the proposed actions specific, relevant, and actionable?

1 = Vague or missing | 3 = Partially useful | 5 = Clear and effective

8. Usefulness for Stakeholders

Would this report be useful for operators, safety managers, legal teams, or HR?

1 = Not useful | 3 = Moderately useful | 5 = Highly useful

Each criterion is scored from 1 (poor) to 5 (excellent). The final score represents the model's ability to conduct professional incident investigations that meet industry standards.