Singapore launches generative AI evaluation sandbox

The Infocomm Media Development Authority of Singapore (IMDA) along with the AI Verify Foundation has launched the Generative AI Evaluation Sandbox, a new initiative to enable the evaluation of trusted artificial intelligence (AI) products and reveal potential gaps.

IMDA said the Sandbox will make use of a new Evaluation Catalogue to set out common baseline methods and recommendations to assess generative AI products.

It is guided by the catalogue that compiles commonly used technical testing tools, organising these according to what they test for and their methods, and recommends a baseline set of tests.

The aim is to establish a common language and support "broader, safe and trustworthy adoption of Gen AI," it added.

IMDA said the Sandbox will also help build evaluation capabilities beyond what currently resides with model developers.

It will involve players in the third-party testing ecosystem, to help model developers understand what external testers would look for in responsible AI models.

"Where possible, each Sandbox use case should involve an upstream Gen AI model developer, a downstream application deployer and a third-party tester to demonstrate how the different players in the ecosystem can work together," IMDA explained.

The initiative has garnered support from global market players like Amazon Web Services, Anthropic, Google, and Microsoft and several other participants including Deloitte, Nvidia, EY, IBM, OCBC Bank and Singtel.

By involving regulators like the Singapore Personal Data Protection Commission (PDPC), the Sandbox will offer space for experimentation and development and allow all parties along the supply chain to be transparent about their needs.

Further, IMDA expects Sandbox to uncover gaps in the current evaluations, including domain-specific applications, such as human resources and cultural-specific areas, which are currently underdeveloped.

"Sandbox will develop benchmarks for evaluating model performance in specific areas that are important for use cases, and for countries like Singapore because of cultural and language specificities," IMDA said.

The authority said it is collaborating with Anthropic on a Sandbox project that uses the catalogue to identify aspects for red teaming, which challenges plans, policies and assumptions used in AI by adopting an adversarial approach.

IMDA will deploy Anthropic's models and research tooling platform to develop red-teaming methodologies for Singapore's diverse linguistic and cultural landscape, for instance, AI models will be tested for their abilities to perform within the country's multi-lingual context.'

The agency is now inviting industry partners to collaboratively build evaluation tools and capabilities in the new sandbox.

Earlier in July, IMDA partnered with Google to launch privacy-enhancing technologies (PET) x Privacy Sandbox to support businesses that wish to pilot PET projects.

The Singapore government had also launched two sandboxes, one for exclusive use by government agencies and the other for enterprises to develop and test Gen AI applications.