Criterion 108: Perform user acceptance testing (UAT) and scenario testing, validating the system with a diversity of end-users in their operating contexts and real-world scenarios.
Criterion 109: Perform robust regression testing to mitigate the heightened risk of escaped defects resulting from changes, such as a step change in parameters.
Traditional software regression testing is insufficient.
This may include:
Criterion 110: Ensure the AI system meets architecture and operational requirements with the Australian Government Security Authority to Operate (SATO).
This aspect of integration planning includes:
Criterion 111: Identify suitable tests for integration with the operational environment, systems, and data.
This includes:
Criterion 112: Apply secure and auditable continuous integration practices for AI systems.
Continuous integration (CI) pipelines enable agencies to build, test, and validate changes upon every commit or merge, while accounting for computational requirements resulting from re-testing expensive model training processes. The CI pipeline should include any automated tests defined in the test stage, automating model training, as well as static and dynamic source code analysis.
These pipelines typically involve:
Criterion 113: Develop plans to ensure critical systems remain operational during disruptions.
This includes:
Criterion 114: Ensure the staging environment mirrors the production environment in configurations, libraries, and dependencies for consistency and predictability suited to the use case.
Criterion 115: Measure the performance of the AI system in the staging environment against predefined metrics.
Criterion 116: Ensure deployment strategies include monitoring for AI-specific metrics, such as inference latency and output accuracy.
Criterion 117: Apply strategies for phased roll-out.
Consider splitting traffic between the current and new version being rolled out, or rolling out to a subset of users to gradually introduce changes and detect issues before full deployment.
Criterion 118: Apply readiness verification, assurance checks, and change management practices for the AI system.
This typically involves:
Criterion 119: Apply strategies for limiting service interruptions.
This typically involves:
Criterion 120: Define a comprehensive rollout and rollback strategy.
This should safeguard data and limit data corruption.
Criterion 121: Implement load balancing and traffic shifting methods for system rollout.
This includes:
Criterion 122: Conduct regular testing, health checks, readiness, and startup probes to verify stability before routing traffic for all deployed AI services.
Consider using probes to continuously monitor during deployment, to detect issues early and rollback upon failure.
Criterion 123: Implement rollback mechanisms to revert to the last stable version in case of failure.
This includes: