The paper “Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems” introduces a family of approaches to AI safety, termed “Guaranteed Safe (GS) AI.” These approaches aim to provide high-assurance quantitative guarantees about AI systems’ safety by leveraging three core components: a world model, a safety specification, and a verifier.
Core Components:
- World Model:
- A mathematical description of how the AI system interacts with the outside world, handling both Bayesian and Knightian uncertainty.
- Can range from no model at all to fully verified models based on physical laws, with varying degrees of complexity and reliability.
- Safety Specification:
- A mathematical description of acceptable effects or behaviors of the AI system.
- Can be context-specific or general, potentially involving complex predicates like “harm” or “truthfulness.”
- Verifier:
- Provides an auditable proof certificate that the AI satisfies the safety specification relative to the world model.
- Can range from empirical testing to formal proofs, with the goal of providing rigorous, quantifiable safety guarantees.
Challenges and Approaches:
- Creating Accurate World Models:
- Manual creation for well-understood environments or automatic generation using AI for more complex scenarios.
- Incorporating uncertainties and ensuring interpretability and predictive accuracy.
- Formulating Safety Specifications:
- Difficult to define complex, open-ended specifications.
- Strategies include manual specification, learning from data, and using conservative safety predicates.
- Verification Methods:
- Various levels of verification strength, from empirical testing to formal proofs.
- Integrating formal methods into the design process and ensuring scalability and compositional reasoning.
Motivation and Necessity:
- High Autonomy and General Intelligence:
- AI systems with high degrees of autonomy and general intelligence, or those used in safety-critical contexts, require robust safety guarantees.
- Experimental tests alone are insufficient; rigorous, formal approaches are needed to ensure safety.
Examples:
- Code Translation:
- Translating code between programming languages with functional correctness and security guarantees.
- Autonomous Vehicles (AV):
- Designing AV systems with formal safety and functionality specifications, including backup safety measures.
- Household Robots:
- Providing assistance with diverse household tasks while ensuring safety and adaptability to individual needs.
Conclusion:
The paper argues for the necessity of GS AI approaches to ensure the safety of advanced AI systems. These approaches, while challenging, offer promising avenues for providing the rigorous guarantees needed for high-stakes applications. The paper highlights the inadequacy of current AI safety practices that rely mainly on empirical evaluations and advocates for a more formal, model-based approach to AI safety.