In this FastTrack TechTalk session, Microsoft architects delve into transitioning from evaluation theory to real-world applications for AI agents in Dynamics 365. The talk emphasizes designing effective evaluation sets, picking appropriate metrics, and utilizing an Evaluation Design Document (EDD) to ensure reliable agent performance. A real-world scenario is discussed, demonstrating how AI workflows introduce potential risks and how structured evaluations can avert costly failures. Key insights include the inadequacy of traditional testing models for AI agents, the importance of constructing evaluation scenarios, the use of synthetic versus real-world data, selecting relevant metrics for both single-turn and multi-turn agents, and the critical role of the EDD in governance and risk management. This session equips anyone working with Copilot, AI agents, or automation in Dynamics 365 to design evaluations that proactively identify issues.
Login now to access my digest by 365.Training