Statement 22: Implement model creation, tuning, and grounding

Agencies must

  • Criterion 79: Set assessment criteria for the AI model, with respect to pre-defined metrics for the AI system.

    These criteria should address:

    • success factors specific to user stories
    • model quality thresholds and performance of the AI system
    • explainability and interpretability requirements
    • security and privacy requirements
    • ethics requirements
    • tolerance for error for model outputs
    • tolerance for negative impacts
    • error rates by scale and similar processing by humans.

    Considerations for modelling include:

    • model training, maintenance, and support costs
    • data and compute infrastructure constraints
    • likelihood of the AI models becoming outdated
    • whether the model can be legally used for the intended use case
    • whether methods can be implemented to mitigate risk of new harms being introduced into the AI system
    • bias, security, and ethical concerns
    • whether the model meets the explainability and interpretability requirements
    • use of model interpretability tools to analyse important features and decision logic.
  • Criterion 80: Identify and address situations when AI outputs should not be provided.

    These situations include:

    • low confidence scores
    • when user input and context are ambiguous or lack reliable sources
    • complex questions as input
    • limited knowledge base
    • privacy concerns and potential breach of safety
    • harmful content
    • unlawful content
    • misleading content.

    For GenAI, implementing techniques such as threshold settings or content filtering could address these situations.

  • Criterion 81: Apply considerations for reusing existing agency models, off-the-shelf, and pre-trained models.

    These include:

    • whether the model can be adapted to meet the KPIs for the AI system
    • suitability of pre-defined AI architecture
    • availability of AI specialist skills or skills required for configuration and integration
    • whether the model is relevant to the target operating domain or can be adapted to it, such as fine-tuning, retrieval-augmented generation (RAG), and pre-processing and post-processing techniques
    • cybersecurity assessment in line with Australian Government policies and guidance (see Whole of AI Lifecycle for more details).
  • Criterion 82: Create or fine-tune models optimised for target domain environment.

    This includes:

    • model testing on target operating environment and infrastructure
    • using pre-processing and post-processing techniques
    • addressing input and output filtering requirements for safety and reliability
    • grounding such as RAG, which can augment a large language model (LLM) with trusted data from a database or knowledge base internal to an agency
    • for GenAI, prompt engineering or establishing a prompt library, which can streamline and improve interactions with an AI model
    • consider cost and performance implications associated with the adaptation techniques
    • perform unit testing for the training algorithm, pre-processing, and post-processing algorithms
    • track model training implementations systematically to speed up the discovery and development of models. 

Agencies should

  • Criterion 83: Create and train using multiple model architectures and learning strategies. 

    Systematically track model training implementations to speed up the discovery and development of models. This will help select a more optimal trained model.
     

Statement 23: Validate, assess, and update model

Connect with the digital community

Share, build or learn digital experience and skills with training and events, and collaborate with peers across government.