RECAP is a benchmark for intent rewriting in conversational AI. It evaluates how LLMs turn ambiguous, underspecified, or shifting dialogue into clear, planning-ready intent—helping developers build more reliable and effective agentic systems.
FactLens is a benchmark for fine-grained fact verification with LLMs. It breaks complex claims into sub-claims, enabling more precise error detection, better transparency, and high-quality evaluation aligned with human judgments.