New Microsoft tool lets devs spin up AI behavior tests using text descriptions

In a significant advancement for AI evaluation, Microsoft has introduced ASSERT, an innovative open-source framework designed to simplify the testing of AI systems. This tool aims to address the growing need for developers and companies to ensure that their AI solutions function as intended within specific product contexts. ASSERT, which stands for Adaptive Spec-driven Scoring for Evaluation and Regression Testing, transforms plain language descriptions of desired behaviors into structured tests. By taking natural-language input regarding goals, policies, or expected actions, it generates comprehensive and scored evaluations that are easy to analyze. The framework not only defines acceptable and unacceptable behaviors but also creates relevant problem scenarios and test cases. It runs these tests against the AI system and provides scores based on performance. Additionally, ASSERT tracks the system's actions, allowing developers to pinpoint failures and understand the decision-making processes of their AI models. Developers can customize their evaluations by specifying system contexts, tools, and constraints. For instance, a developer might ensure that a document research AI agent adheres to strict guidelines, such as avoiding external communications and restricting sensitive data access to top executives. ASSERT will then produce test cases that continuously assess compliance with these rules. According to Sarah Bird, Microsoft's chief product officer for Responsible AI, this tool fills a crucial gap that general evaluation methods cannot address. "Evaluations are critical for making informed decisions about AI behavior," Bird emphasized. She noted that understanding how an AI system operates is essential for determining whether it meets organizational standards. ASSERT can be utilized during system development, post-deployment, and for ongoing monitoring, aligning with a broader trend in the AI industry toward more rigorous and repeatable testing methodologies. As AI models become increasingly sophisticated, initiatives like Stanford’s HELM and MLCommons’ AILuminate are emerging to establish benchmarks for evaluating AI behavior across various conditions.

Sources : TechCrunch

Published On : Jun 02, 2026, 19:15

Startups

Transforming India's Global Capability Centres: Opportunities Amid AI Disruption

India is home to over 2,100 Global Capability Centres (GCCs), which collectively employ approximately 2.3 million indivi...

Business Today | Jul 09, 2026, 08:45

Transforming India's Global Capability Centres: Opportunities Amid AI Disruption

Computing

Meta Breaks Ground on First AI-Focused Data Center in Canada

Meta has officially commenced construction on its inaugural AI data center in Canada, located in Sturgeon County, Albert...

Business Insider | Jul 09, 2026, 02:10

Meta Breaks Ground on First AI-Focused Data Center in Canada

Startups

Luxshare Precision Sees Initial Setback in Hong Kong Stock Market

Luxshare Precision Industry experienced a notable decline of over 5% during its trading debut in Hong Kong on Thursday. ...

CNBC | Jul 09, 2026, 01:45

Luxshare Precision Sees Initial Setback in Hong Kong Stock Market

Startups

Microsoft Announces Major Workforce Reduction with Generous Severance Packages

In a significant organizational shift, Microsoft has revealed plans to trim its workforce by 2.1%, affecting over 4,800 ...

Business Today | Jul 09, 2026, 06:15

Microsoft Announces Major Workforce Reduction with Generous Severance Packages

Meta Halts Controversial AI Program Amid Data Mishap Concerns

In a recent interview, Meta's Chief Technology Officer Andrew Bosworth revealed the reasons behind the company's decisio...

Business Insider | Jul 09, 2026, 05:05

Meta Halts Controversial AI Program Amid Data Mishap Concerns

View All News

High-quality, Cost-effective IT Outsourcing

let’s grow together!

portfolio

case study

follow us on

follow us on

New Microsoft tool lets devs spin up AI behavior tests using text descriptions

Transforming India's Global Capability Centres: Opportunities Amid AI Disruption

Meta Breaks Ground on First AI-Focused Data Center in Canada

Luxshare Precision Sees Initial Setback in Hong Kong Stock Market

Microsoft Announces Major Workforce Reduction with Generous Severance Packages

Meta Halts Controversial AI Program Amid Data Mishap Concerns

Collaborate with Benzatine Infotech

High-quality, Cost-effective IT Outsourcing

let’s grow together!

portfolios

case study

follow us on

follow us on

portfolio

case study

follow us on

follow us on

New Microsoft tool lets devs spin up AI behavior tests using text descriptions

Transforming India's Global Capability Centres: Opportunities Amid AI Disruption

Meta Breaks Ground on First AI-Focused Data Center in Canada

Luxshare Precision Sees Initial Setback in Hong Kong Stock Market

Microsoft Announces Major Workforce Reduction with Generous Severance Packages

Meta Halts Controversial AI Program Amid Data Mishap Concerns

Collaborate with Benzatine Infotech