Statement 31: Undertake integration planning

Agencies should

  • Criterion 110: Ensure the AI system meets architecture and operational requirements with the Australian Government Security Authority to Operate (SATO).

    This aspect of integration planning includes:

    • assessing the AI system and its third-party dependencies against the agency’s requirements to identify risks
    • assessing the AI system against the agency’s architecture principles
    • identifying any gaps between the agency’s current and target infrastructure to support the AI system
    • ensuring the AI system meets security and privacy requirements for handling classified data.
  • Criterion 111: Identify suitable tests for integration with the operational environment, systems, and data.

    This includes:

    • ensuring robust test methods are selected
    • incorporating auto testing processes
    • ensuring that environment controls satisfy security and privacy requirements for the data in the AI system.
       

Statement 32: Manage integration as a continuous practice

Statement 32: Manage integration as a continuous practice

Agencies should

  • Criterion 112: Apply secure and auditable continuous integration practices for AI systems.

    Continuous integration (CI) pipelines enable agencies to build, test, and validate changes upon every commit or merge, while accounting for computational requirements resulting from re-testing expensive model training processes. The CI pipeline should include any automated tests defined in the test stage, automating model training, as well as static and dynamic source code analysis.

    These pipelines typically involve:

    • ensuring end-to-end integration to include data pipeline and data encryption practices
    • verifying and managing dependency checks for outdated or vulnerable libraries
    • validating infrastructure-as-code (IaC) scripts to ensure environments are deployed consistently
    • steps to build and validate container images for AI applications
    • continuous training and delivery of AI models and systems
    • employing fail-fast mechanisms to halt builds upon detection of silent failures and critical errors, such as test failures or vulnerabilities
    • avoiding the propagation of unverified changes from failed workflows to production environments
    • establishing a centralised artifact and model registry, and include steps to package and store artifacts, such as models, APIs, and datasets.
       

Statement 33: Create business continuity plans

Statement 33: Create business continuity plans

Agencies must

  • Criterion 113: Develop plans to ensure critical systems remain operational during disruptions.

    This includes:

    • identifying and managing potential risks to AI operations
    • defining disaster recovery, backup and restore, monitoring plans
    • testing business continuity plans for relevance
    • regularly reviewing and updating objectives, success criteria, failure indicators, plans, processes and procedures to ensure they remain appropriate to the use case and its operating environment.
       

Statement 34: Configure a staging environment

Statement 34: Configure a staging environment 

Agencies should

  • Criterion 114: Ensure the staging environment mirrors the production environment in configurations, libraries, and dependencies for consistency and predictability suited to the use case.

  • Criterion 115: Measure the performance of the AI system in the staging environment against predefined metrics.

  • Criterion 116: Ensure deployment strategies include monitoring for AI-specific metrics, such as inference latency and output accuracy.

Statement 35: Deploy to a production environment

Statement 35: Deploy to a production environment

Agencies must

  • Criterion 117: Apply strategies for phased roll-out.

    Consider splitting traffic between the current and new version being rolled out, or rolling out to a subset of users to gradually introduce changes and detect issues before full deployment.

  • Criterion 118: Apply readiness verification, assurance checks, and change management practices for the AI system.

    This typically involves:

    • the readiness verification, which includes all tests and covers the entire system – code, model, data, and related components
    • consent for data governance, data use, and auditing frameworks
    • ensuring all production deployments follow change management protocols, including impact assessment, notifying stakeholders, updating training, assurance, approvals, testing, and documentation
    • including the rationale for deploying or updating AI systems in the change records to ensure accountability and transparency
    • understanding the implications of AI model auto-updates in production, including options to disable
    • understanding the implications of AI system online and dynamic learning in production, including options to disable.

Agencies should

  • Criterion 119: Apply strategies for limiting service interruptions.

    This typically involves:

    • implementing strategies to avoid service interruptions and reduce risk during updates where zero downtime is required
    • configuring instance draining to ensure active requests are not interrupted while allowing completion of long-running AI inference tasks
    • include cost tracking on deployment workflows for additional resources used during deployment
    • include real-time monitoring and alerting to detect and respond to issues during deployment processes and transitions.
       

Statement 36: Implement rollout and safe rollback mechanisms

Statement 36: Implement rollout and safe rollback mechanisms 

Agencies should

  • Criterion 120: Define a comprehensive rollout and rollback strategy.

    This should safeguard data and limit data corruption.

  • Criterion 121: Implement load balancing and traffic shifting methods for system rollout.

    This includes:

    • using load balancers to distribute traffic dynamically between old and new deployments during updates
    • creating traffic shifting policies to safeguard against overwhelming newly deployed AI systems with high inference demands.
  • Criterion 122: Conduct regular testing, health checks, readiness, and startup probes to verify stability before routing traffic for all deployed AI services.

    Consider using probes to continuously monitor during deployment, to detect issues early and rollback upon failure.

  • Criterion 123: Implement rollback mechanisms to revert to the last stable version in case of failure.

    This includes:

    • implementing automated rollback mechanisms to revert to the last stable version in case of pre-defined critical failure for AI deployments
    • failures that do not satisfy the trigger for automated rollback require human intervention to analyse and decide the next steps.
       

Statement 37: Establish monitoring framework

Statement 37: Establish monitoring framework

Agencies should

  • Criterion 124: Define reporting requirements.

    This includes:

    • establishing a plan for providing different stakeholders with reports
    • for each group of stakeholders (persona), define what needs to be reported, why, when, and how.
  • Criterion 125: Define alerting requirements.

    This includes:

    • defining what information needs alerting
    • defining what information is critical to be alerted in real-time
    • defining severity levels, such as major, minor, warning
    • defining thresholds, out-of-pattern behaviour, and other triggers for each alert level
    • defining who needs to be alerted and the method of alert such as SMS or e-mail.
  • Criterion 126: Implement monitoring tools.

    This includes:

    • monitoring the information needed to satisfy alerting and reporting requirements
    • automating monitoring, alerting, and reporting
    • implementing management information and dashboards
    • implementing role-based access to protect sensitive information and meet security requirements
    • implementing real-time alerting requirements.
  • Criterion 127: Implement feedback loop to ensure that insights from monitoring are fed back into the development and improvement of the AI system.

    This includes:

    • a decision matrix outlining guidance on what components in the AI system would need an update or refresh, such as pre or post processing components, AI model, or a RAG knowledge base in a GenAI system
    • a framework to provide and track recommended actions from the insights
    • a guideline for identifying actions to address insights, with considerations to costs, delays, AI trust, and effectiveness.
       

Statement 38: Undertake ongoing testing and monitoring

Statement 38: Undertake ongoing testing and monitoring

Connect with the digital community

Share, build or learn digital experience and skills with training and events, and collaborate with peers across government.