
In a significant advancement for AI evaluation, Microsoft has introduced ASSERT, an innovative open-source framework designed to simplify the testing of AI systems. This tool aims to address the growing need for developers and companies to ensure that their AI solutions function as intended within specific product contexts. ASSERT, which stands for Adaptive Spec-driven Scoring for Evaluation and Regression Testing, transforms plain language descriptions of desired behaviors into structured tests. By taking natural-language input regarding goals, policies, or expected actions, it generates comprehensive and scored evaluations that are easy to analyze. The framework not only defines acceptable and unacceptable behaviors but also creates relevant problem scenarios and test cases. It runs these tests against the AI system and provides scores based on performance. Additionally, ASSERT tracks the system's actions, allowing developers to pinpoint failures and understand the decision-making processes of their AI models. Developers can customize their evaluations by specifying system contexts, tools, and constraints. For instance, a developer might ensure that a document research AI agent adheres to strict guidelines, such as avoiding external communications and restricting sensitive data access to top executives. ASSERT will then produce test cases that continuously assess compliance with these rules. According to Sarah Bird, Microsoft's chief product officer for Responsible AI, this tool fills a crucial gap that general evaluation methods cannot address. "Evaluations are critical for making informed decisions about AI behavior," Bird emphasized. She noted that understanding how an AI system operates is essential for determining whether it meets organizational standards. ASSERT can be utilized during system development, post-deployment, and for ongoing monitoring, aligning with a broader trend in the AI industry toward more rigorous and repeatable testing methodologies. As AI models become increasingly sophisticated, initiatives like Stanford’s HELM and MLCommons’ AILuminate are emerging to establish benchmarks for evaluating AI behavior across various conditions.
Billionaire investor Ray Dalio has shared insights on the contrasting philosophies regarding artificial intelligence (AI...
Business Insider | Jun 03, 2026, 16:10Carvana is poised to deepen its involvement in the electric vehicle market by securing an investment option in Slate Aut...
TechCrunch | Jun 03, 2026, 17:50
On Tuesday, former President Donald Trump signed an executive order aimed at enhancing government oversight of emerging ...
Ars Technica | Jun 03, 2026, 18:15
During a recent enterprise event, Sam Altman, CEO of OpenAI, unveiled intriguing insights about the token spending habit...
Business Insider | Jun 03, 2026, 17:00In a surprising development, Amazon has announced the introduction of AI-generated product images in its shopping app, i...
TechCrunch | Jun 03, 2026, 16:05