Search

Statement 28: Test for safety, robustness, and reliability
Agencies must
Criterion 101: Test the computational performance of the system.
This includes:
testing for response times, latency, and resource usage under various loads
network and hardware load testing.
Criterion 102: Test safety measures through negative testing methods, failure testing, and fault injection.
This includes:
testing for incorrect or harmful inputs.
Criterion 103: Test reliability of the AI output, through stress testing over an extended period, simulating edge cases, and operating under extreme conditions.
Agencies should
Criterion 104: Undertake adversarial testing (red team testing), attempting to break security and privacy measures to identify weaknesses.
AI-specific attacks can be executed before, during, and after training.
Examples of attacks that can be made before and during training includes:
dataset poisoning
algorithm poisoning
model poisoning
backdoor attacks.
Examples of attacks that can be made after training includes:
input attack and evasion
reverse engineering the model and data.
Statement 29: Test for conformance and compliance

Next Page
Statement 29: Test for conformance and compliance
Agencies must
Criterion 105: Verify compliance with relevant policies, frameworks, and legislation.
Criterion 106: Verify conformance against organisation and industry-specific coding standards.
This includes static and dynamic source code analysis. While agencies may use traditional analysis tools for the whole system, it is important to note their limitations with respect to AI models and consider finding tools built specifically for AI models.

Criterion 107: Perform vulnerability testing to identify any well-known vulnerabilities.
This includes:
testing for entire AI system.
Statement 30: Test for intended and unintended consequences

Next Page
Statement 30: Test for intended and unintended consequences
Agencies must
Criterion 108: Perform user acceptance testing (UAT) and scenario testing, validating the system with a diversity of end-users in their operating contexts and real-world scenarios.
Agencies should
Criterion 109: Perform robust regression testing to mitigate the heightened risk of escaped defects resulting from changes, such as a step change in parameters.
Traditional software regression testing is insufficient.
This may include:
back-to-back testing to compare two versions of system or software using historical data
A/B software testing to simultaneously compare multiple versions in a real-world setting. This allows agencies to assess the impact of a specific model or software package on the overall system in its intended operating environment.
performance regression, checking for any degradation in model accuracy, fairness, or other key metrics.
Statement 31: Undertake integration planning

Next Page
Statement 31: Undertake integration planning
Criterion 4 - Make it accessible
Comply with all current legislation and standards relating to accessibility, and ensure from the outset that your service is accessible.
Off
Agencies should
Criterion 110: Ensure the AI system meets architecture and operational requirements with the Australian Government Security Authority to Operate (SATO).
This aspect of integration planning includes:
assessing the AI system and its third-party dependencies against the agency’s requirements to identify risks
assessing the AI system against the agency’s architecture principles
identifying any gaps between the agency’s current and target infrastructure to support the AI system
ensuring the AI system meets security and privacy requirements for handling classified data.
Criterion 111: Identify suitable tests for integration with the operational environment, systems, and data.
This includes:
ensuring robust test methods are selected
incorporating auto testing processes
ensuring that environment controls satisfy security and privacy requirements for the data in the AI system.
Statement 32: Manage integration as a continuous practice

Next Page
Statement 32: Manage integration as a continuous practice
Agencies should
Criterion 112: Apply secure and auditable continuous integration practices for AI systems.
Continuous integration (CI) pipelines enable agencies to build, test, and validate changes upon every commit or merge, while accounting for computational requirements resulting from re-testing expensive model training processes. The CI pipeline should include any automated tests defined in the test stage, automating model training, as well as static and dynamic source code analysis.
These pipelines typically involve:
ensuring end-to-end integration to include data pipeline and data encryption practices
verifying and managing dependency checks for outdated or vulnerable libraries
validating infrastructure-as-code (IaC) scripts to ensure environments are deployed consistently
steps to build and validate container images for AI applications
continuous training and delivery of AI models and systems
employing fail-fast mechanisms to halt builds upon detection of silent failures and critical errors, such as test failures or vulnerabilities
avoiding the propagation of unverified changes from failed workflows to production environments
establishing a centralised artifact and model registry, and include steps to package and store artifacts, such as models, APIs, and datasets.
Statement 33: Create business continuity plans

Next Page
Statement 33: Create business continuity plans
Agencies must
Criterion 113: Develop plans to ensure critical systems remain operational during disruptions.
This includes:
identifying and managing potential risks to AI operations
defining disaster recovery, backup and restore, monitoring plans
testing business continuity plans for relevance
regularly reviewing and updating objectives, success criteria, failure indicators, plans, processes and procedures to ensure they remain appropriate to the use case and its operating environment.
Statement 34: Configure a staging environment

Next Page
Statement 34: Configure a staging environment
Agencies should
Criterion 114: Ensure the staging environment mirrors the production environment in configurations, libraries, and dependencies for consistency and predictability suited to the use case.
Criterion 115: Measure the performance of the AI system in the staging environment against predefined metrics.
Criterion 116: Ensure deployment strategies include monitoring for AI-specific metrics, such as inference latency and output accuracy.
Statement 35: Deploy to a production environment

Next Page
Statement 35: Deploy to a production environment

Statement 28: Test for safety, robustness, and reliability

Agencies must

Agencies should

Statement 29: Test for conformance and compliance

Statement 29: Test for conformance and compliance

Agencies must

Statement 30: Test for intended and unintended consequences

Statement 30: Test for intended and unintended consequences

Agencies must

Agencies should

Statement 31: Undertake integration planning

Statement 31: Undertake integration planning

Agencies should

Statement 32: Manage integration as a continuous practice

Statement 32: Manage integration as a continuous practice

Agencies should

Statement 33: Create business continuity plans

Statement 33: Create business continuity plans

Agencies must

Statement 34: Configure a staging environment

Statement 34: Configure a staging environment

Agencies should

Statement 35: Deploy to a production environment

Statement 35: Deploy to a production environment

Connect with the digital community