Statement 22: Implement model creation, tuning, and grounding
Agencies must
Criterion 79: Set assessment criteria for the AI model, with respect to pre-defined metrics for the AI system.
These criteria should address:
- success factors specific to user stories
- model quality thresholds and performance of the AI system
- explainability and interpretability requirements
- security and privacy requirements
- ethics requirements
- tolerance for error for model outputs
- tolerance for negative impacts
- error rates by scale and similar processing by humans.
Considerations for modelling include:
- model training, maintenance, and support costs
- data and compute infrastructure constraints
- likelihood of the AI models becoming outdated
- whether the model can be legally used for the intended use case
- whether methods can be implemented to mitigate risk of new harms being introduced into the AI system
- bias, security, and ethical concerns
- whether the model meets the explainability and interpretability requirements
- use of model interpretability tools to analyse important features and decision logic.
Criterion 80: Identify and address situations when AI outputs should not be provided.
These situations include:
- low confidence scores
- when user input and context are ambiguous or lack reliable sources
- complex questions as input
- limited knowledge base
- privacy concerns and potential breach of safety
- harmful content
- unlawful content
- misleading content.
For GenAI, implementing techniques such as threshold settings or content filtering could address these situations.
Criterion 81: Apply considerations for reusing existing agency models, off-the-shelf, and pre-trained models.
These include:
- whether the model can be adapted to meet the KPIs for the AI system
- suitability of pre-defined AI architecture
- availability of AI specialist skills or skills required for configuration and integration
- whether the model is relevant to the target operating domain or can be adapted to it, such as fine-tuning, retrieval-augmented generation (RAG), and pre-processing and post-processing techniques
- cybersecurity assessment in line with Australian Government policies and guidance (see Whole of AI Lifecycle for more details).
Criterion 82: Create or fine-tune models optimised for target domain environment.
This includes:
- model testing on target operating environment and infrastructure
- using pre-processing and post-processing techniques
- addressing input and output filtering requirements for safety and reliability
- grounding such as RAG, which can augment a large language model (LLM) with trusted data from a database or knowledge base internal to an agency
- for GenAI, prompt engineering or establishing a prompt library, which can streamline and improve interactions with an AI model
- consider cost and performance implications associated with the adaptation techniques
- perform unit testing for the training algorithm, pre-processing, and post-processing algorithms
- track model training implementations systematically to speed up the discovery and development of models.
Agencies should
Criterion 83: Create and train using multiple model architectures and learning strategies.
Systematically track model training implementations to speed up the discovery and development of models. This will help select a more optimal trained model.