Translate intentions into observable statements using clear verbs, criteria, and contexts. For each chat module, specify what a learner can demonstrate without hints, how well, and under which conditions. Link prompts to rubrics and exemplars so judgment is transparent, repeatable, and resilient across different facilitators or automated assistants.
Balance tangible outputs with process data and learner perceptions for richer meaning. Artifacts show capability; interaction logs reveal strategies and misconceptions; reflections surface confidence and metacognition. Together, these sources reduce blind spots, support fairness, and help explain why scores move, not merely whether they appear to improve.
High completion rates and long chat threads can look impressive yet mask shallow learning. Prioritize measures linked to retention, transfer, and decision quality under pressure. Calibrate success using benchmark tasks, delayed follow-ups, and performance in realistic scenarios to ensure numbers reflect genuine capability, not superficial activity.