AI Technical Standard: Statement 38

Statement 38: Undertake ongoing testing and monitoring

Agencies must

Criterion 128: Test periodically after deployment and have a clear framework to manage any issues.
This assures that the system still operates as intended. See Test section for applicable tests.
Criterion 129: Monitor the system as agreed and specified in its operating procedures.
Ensure the operators understand when, why, and how to intervene.
Criterion 130: Monitor performance and AI drift as per pre-defined metrics.
Criterion 131: Monitor health of the system and infrastructure.
This includes:
- monitoring logs for errors
- services or processes
- resources such as compute, memory, storage, and network.
Criterion 132: Monitor safety.
This includes:
- monitoring inputs and outputs for abuse, misuse, sensitive information disclosure, and other forms of harm.
Criterion 133: Monitor reliability metrics and mechanisms.
This includes:
- error rates
- fault detection
- recovery
- redundancy
- failover mechanisms.
Criterion 134: Monitor human-machine collaboration.
This includes:
- reviewing the human experience
- assessing the effectiveness of the human oversight and control measures
- analysing the usage metrics for friction points and finding opportunities for improving the overall outcome from human-machine collaboration
- considering different monitoring methods. While surveys could be cost effective, face-to-face interviews and observing users live while interacting with the AI system could provide better insights.
Criterion 135: Monitor for unintended consequences.
This typically includes:
- implementing various channels for people to provide feedback, issues, or contest outcomes
- consider if anonymous channels are needed
- tracing how the outputs of the AI system are used
- analysing quantitative and qualitative data for recurring harms
- look for missing data, such as checking if certain demographics are not using the system.
Criterion 136: Monitor transparency and explainability.
Periodically check that transparency and explainability requirements are met post deployment.
Criterion 137: Monitor costs.
The cost model for using AI systems may be different and much more costly than traditional software and systems.
Criterion 138: Monitor security.
This may include logging AI services in use to satisfy security requirements and ensuring appropriate data loss prevention (DLP).
Identify the scope of deployment data for the AI system.
These include:
- data submitted by the user (prompts)
- agency data augmented into the prompts
- content generated by the service via completions, images, and embedding operations
- training and validation data from the department that will be used for fine-tuning a model.
DLP includes:
- ensuring that the supplier does NOT use agency data for improving the supplier's AI systems or other products
- ensuring the system only accesses data that the end-user is authorised to access
- ensuring that human review is performed by authorised users only
- monitoring for sensitive data disclosure
- monitoring data access and usage
- automating data classification
- monitoring for anomalies and suspicious activities
- ensuring data encryption is enabled for data-at-rest and data-in-transit
- ensuring that any data provided to the model, or generated by the model, can be deleted completely by the authorised user.
Criterion 139:Monitor compliance of the AI system.

Technical standard for government’s use of artificial intelligence: Monitor statements

Statement 38: Undertake ongoing testing and monitoring

Agencies must

Statement 39: Establish incident resolution processes

Connect with the digital community