The STEM Grading Problem: Why It Seems Harder for AI
At first glance, STEM exam grading seems uniquely unsuited to AI automation. Mathematical derivations span multiple lines, involve specialized notation, and require the evaluator to trace logical steps rather than evaluate a continuous paragraph of prose. A physics derivation might involve seven steps, a diagram, and a final numerical answer — and the pedagogically correct approach is to award marks at each stage, not just for the final answer. Science exams include chemical structural formulas, circuit diagrams, and biological illustrations. These formats appear far more complex than textual descriptive answers. However, AI grading for STEM — properly implemented — is not only possible but particularly powerful, because the structured, step-by-step nature of STEM answers maps well onto rubric-based evaluation.
Recognizing Mathematical Notation in Handwriting
DASES uses a specialized mathematical handwriting recognition pipeline that goes beyond standard text ICR. It recognizes handwritten instances of standard mathematical notation: integral signs, sigma notation, fractions, subscripts and superscripts, Greek letters (α, β, δ, θ, λ), vector notation, matrix brackets, and standard calculus operators. The model is trained to recognize these symbols even when written quickly or imperfectly — it understands that a hastily written ∫ is still an integral, and that a messy but contextually appropriate dx at the end of an expression is the differential. Chemical formulas (H₂O, C₆H₁₂O₆, structural formulas with bond notation) are handled through a chemistry-specific recognition model.
Step-by-Step Derivation Evaluation: The Core STEM Capability
The critical capability for STEM grading is step-level evaluation. For a physics derivation worth 8 marks, a faculty member in DASES would set up a step rubric: Step 1 — Correct statement of starting equation (1 mark). Step 2 — Correct identification of relevant physical principle/law (2 marks). Step 3 — Algebraic manipulation carried out correctly (2 marks). Step 4 — Substitution of values with correct units (1 mark). Step 5 — Final numerical answer with correct unit (2 marks). DASES evaluates each step independently. A student who correctly executes Steps 1-4 but makes an arithmetic error at Step 5 receives 6/8 — accurate partial credit that reflects genuine partial understanding, unlike "all-or-nothing" grading based solely on the final answer.
Chemistry: Structural Formulas and Reaction Equations
Chemistry exam evaluation presents some of the most complex recognition challenges in the STEM domain. DASES handles balanced chemical equations — recognizing element symbols, subscripts, state indicators (s), (l), (g), (aq), and reaction arrows — and evaluates them against the rubric for correct balancing, correct products, and correct conditions. Structural organic chemistry formulas are recognized through a dedicated structural formula recognition module that identifies carbon chains, functional groups, and bond types. For these components, DASES extracts the structural information and evaluates it against the model answer's structure, awarding marks for correctly identified functional groups, correct connectivity, and accurate bond representation.
Physics: Diagrams, Free Body Diagrams, and Circuit Schematics
Pure diagram evaluation remains the most challenging aspect of STEM grading for AI systems. DASES takes a pragmatic approach: for diagrams where the key evaluation criteria can be expressed textually (e.g., "correctly labeled axes," "arrow direction consistent with described force," "circuit loop closed correctly"), the AI evaluates the labeled components and their relationships. For highly artistic or interpretative diagrams (like a detailed biological cell diagram or a complex 3D structure), the system flags these answers for mandatory faculty review rather than attempting a potentially unreliable autonomous evaluation. This hybrid approach — AI for what it does well, human for what requires visual interpretation — is more honest and ultimately more accurate than attempting fully autonomous diagram grading.
Biology: Short Answer and Application Questions
Biology exam questions range from short factual recall (definitions, classifications, naming) to complex application and analysis questions (explaining experimental observations, predicting outcomes, comparing processes). Short factual recall questions are among the easiest for AI to grade accurately — the answer is either correct or incorrect, the vocabulary is specific, and the rubric is simple. Application questions are handled through semantic evaluation — the AI identifies whether the student's explanation demonstrates the correct understanding of the mechanism, even if phrased differently from the model answer. For diagram-based questions (label the diagram, explain the numbered component), DASES evaluates the labels and associated explanations.
DASES Accuracy Benchmarks for STEM Exams
Faculty who pilot DASES for STEM subjects consistently observe high rubric adherence in the review dashboard — the AI's proposed scores align with faculty professional judgment in the vast majority of cases. The clearest accuracy gains are in step-by-step mathematical and physics derivations, where the AI's structured, criterion-by-criterion evaluation proves more granular and consistent than holistic human marking at volume. The areas requiring the most faculty review are: complex hand-drawn diagrams (as noted above), highly non-standard solution approaches that arrive at the correct answer by an unexpected route, and answers containing significant irrelevant content alongside the correct response. These edge cases are explicitly flagged by DASES for human attention.
