Guidance for AI proof of concept to scale: Appendix 2

Appendix 2: AI evaluation

Scaling AI from PoC to full deployment involves continuous evaluation across all key aspects:

technical feasibility and system performance – robust AI models, infrastructure and data management
business alignment – AI objectives and business goals alignment, and change management
monitoring and maintenance – AI system monitoring, retraining and feedback loops
risk and compliance – addressing ethical issues, compliance and security
finance and planning – ensuring adequate staffing and resources are available at each stage of the AI initiative

Evaluation practices should align with the Commonwealth Evaluation Policy.

Below is a summary of the key evaluation options for scaling AI:

Key evaluation options for scaling AI
Category	Evaluation options
Technical feasibility and system performance	conduct thorough testing on edge cases, noisy data and real-world scenarios use various AI model metrics to test quality and reliability A/B testing test how the AI will scale in terms of computation and data test performance for bottlenecks and system limitations test latency, throughput and response times user and business feedback ensure smooth updates for deployment of updates ensure disaster recovery mechanisms are in place see the AI technical standard for further details.
Business alignment	Ensure the AI solution continues to meet business objectives. Perform cost-benefit analysis and ROI projections in line with the Benefits Management Policy. Gather feedback from business. Assess change impact. A roadmap for training end-users.
Monitoring and maintenance	Tracking mechanism for performance of AI over time and changes. Ensure feedback loops allow addressing of AI drifts and degradation. Ensure AI explainability is embedded.
Risk and compliance	Assess the AI for ethical issues including bias. Ensure regular audits and review mechanisms are in place. Ensure the AI is compliant with legal frameworks.
Finance and planning	Cost analysis for staff and infrastructure, and each AI stage. Evaluation embedded in each AI stage.

Appendix 3: Informing procurement