Designing Evaluation Sets, Metrics, and the Evaluation Blueprint | FastTrack TechTalk | Dynamics 365


In this FastTrack TechTalk session, Microsoft architects delve into transitioning from evaluation theory to real-world applications for AI agents in Dynamics 365. The talk emphasizes designing effective evaluation sets, picking appropriate metrics, and utilizing an Evaluation Design Document (EDD) to ensure reliable agent performance. A real-world scenario is discussed, demonstrating how AI workflows introduce potential risks and how structured evaluations can avert costly failures. Key insights include the inadequacy of traditional testing models for AI agents, the importance of constructing evaluation scenarios, the use of synthetic versus real-world data, selecting relevant metrics for both single-turn and multi-turn agents, and the critical role of the EDD in governance and risk management. This session equips anyone working with Copilot, AI agents, or automation in Dynamics 365 to design evaluations that proactively identify issues.


Video 4w

Login now to access my digest by 365.Training

Learn how my digest works
Features
  • Articles, blogs, podcasts, training, and videos
  • Quick read TL;DRs for each item
  • Advanced filtering to prioritize what you care about
  • Quick views to isolate what you are looking for right now
  • Save your favorite items
  • Share your favorites
  • Snooze items you want to revisit when you have more time