• Agencies must

    • Criterion 59: Define quality assessment criteria for the data used in the AI system.

      Data quality can be measured across a variety of dimensions (in line with the ABS Data Quality Framework) by identifying institutional environment, relevance, timeliness, accuracy, coherence, interpretability, and accessibility.

      A report on data quality can include:

      • data quality statement (see ABS Data Quality Statement Checklist)
      • metrics for measuring data quality, including its correctness and credibility
      • frequency of reporting on data quality
      • delegating ownership to a business area to be responsible for managing data quality
      • monitoring for changes in quality across the supply chain
      • intervening and addressing data quality issues as they arise

      Consider:

      • any existing data standard frameworks that are used by the agency. 

    Agencies should:

    • Criterion 60: Implement data profiling activities and remediate any data quality issues.

      This involves analysing the structure, content, and quality of the data to determine its fitness for purpose for an AI system. 

      Data profiling can investigate the following characteristics:

      • frequency
      • volume, range, and distribution
      • invalid entry identification
      • error detection
      • duplicates identification
      • noise identification
      • specific pattern identification.
    • Methods that can be used to explore and analyse the data include:
      • descriptive statistics, such as mean, median, mode, or frequencies
      • business rules – apply business knowledge
      • clustering or dendrogram – group similar observations together
      • visualisation – to get a visual representation of the data from various types of graphs and charts, such as histograms, bar-plot, boxplots, density plots, or heatmaps
      • correlation analysis – measure relationships between variables, usually between numerical variables
      • scatter plots – visualise relationships between two numerical variables
      • cross-tabulations – analyse relationships between multiple categorical variables
      • principal component analysis – analyse variables with the most variance
      • factor analysis – helps reveal hidden patterns.
    • Criterion 61: Define processes for labelling data and managing the quality of data labels.

      Data labelling can be done for the purposes of managing and storing data, audit purposes and AI model training purposes. Humans with appropriate skills and knowledge can perform the data labelling or it could be supported by automated labelling tools. 

      Setting data labelling practices can help optimise performance across the AI system by describing the context, categories, and relationships between data types, creating lineage of data through the AI system via versioning, distinguishing between pre-deployment and live data, and identifying what data will be reused, archived, or destroyed.

      These include:

      • establishing naming schemes, taxonomy, tagging, and data labelling practices
      • considering different techniques such as manual or automated labelling, crowdsourcing, and quality checks
      • defining quality control methods to improve consistency of labelling and assist in reducing bias
      • considering changes to the raw data and data imputations, and associated impact
      • providing data labels for AI training approaches or testing the AI models. Labels can provide the ground truth data for AI models and can influence AI validation. Different types of data labelling include:
        • classification
        • regression
        • visual object labels
        • audio labels
        • entity tagging.
      • applying quality assurance measures to data labels, labelling personnel, and automated data-labelling support tools
      • implementing bias mitigation practices in labelling:
        • establishing a review process. Diverse people could independently label the same data so correlation could be analysed. Final labels could go through spot check review by subject matter experts
        • establishing feedback loops. Labellers should be able to report issues and suggest improvements, and automated systems should be updated to be consistent with corrections made by human labellers
        • establishing performance management for staff. Data labellers should undergo periodic training, performance reviews, and random audits for quality control
        • implementing metadata labelling techniques that capture the type of data categories within the system and the relationship between these categories. Metadata labels can be prepared for model bias evaluation by annotating metadata with suitable dimensions. Ensure the metadata labelling aligns to the Guide on Metadata Attributes | Office of the National Data Commissioner and Australian Government Recordkeeping Metadata Standard | naa.gov.au.
      • assessing and monitoring quality of all automated data labelling support tools. Determine the regularity and criteria for these quality checks and report on findings
      • updating and maintaining the labelling tools and processes to adapt to new data types, and labelling requirements
      • considering potential harm to data labellers who may need to access sensitive or distressing content. This can occur when training an AI model to prevent responses including violence, hate speech, or sexual abuse.
         
  • Statement 17: Validate and select data

  • Statement 17: Validate and select data

  • Agencies must

    • Criterion 62: Perform data validation activities to ensure data meets the requirements for the AI system’s purpose.

      This involves including AI-specific validations in schema migrations to ensure data pipelines and feature stores remain functional. Suitable data validation techniques include:

      • type validation – ensuring data is in the correct data type
      • format validation – ensuring data aligns to a predefined pattern
      • range validation – checking whether data falls within a specific range
      • outlier detection – checking for data points that significantly deviate from the general data pattern
      • completeness – verifying that all required fields are filled
      • diversity – ensuring the data represents a variety of data points

      Considerations include:

      • a quality framework
      • online near real-time and offline batch data validation mechanisms to support the purpose and operations of the AI system.
    • Criterion 63: Select data for use that is aligned with the purpose of the AI system.

      This includes:

      • alignment with the agency’s business intent and the goals of the AI system, as well as ensuring data meets the data quality criteria previously established
      • maintaining a live test dataset to test the AI system in production, to help monitor and maintain the operational integrity of the AI system.
         
  • Statement 18: Enable data fusion, integration and sharing

  • Statement 18: Enable data fusion, integration and sharing

  • Agencies should:

    • Criterion 64: Analyse data fusion and integration requirements.

      This includes:

      Data fusion is a method to integrate or combine data from multiple sources and this can help an AI system create a more comprehensive, reliable, and accurate output. Meaningful data sharing practices across the agency can build interoperability between systems and datasets. Data sharing also promotes reuse, reducing resources for collection and analysis.

    • Criterion 65: Establish an approach to data fusion and integration.

      This approach should involve one or more of the following processes: 

      • ETL (Extract, Transform and Load) – batch movements of data
      • ELT (Extract, Load and Transform) – batch movements of data
      • Application programming interface (API) – allowing the movement and syncing of data across multiple applications
      • data streaming – moving data in or near real-time from source to target
      • data virtualisation – combining streaming data virtually from different sources on demand
      • chaining of AI models – linking multiple AI models in a sequence where the output from one model becomes the input for another.

      Consider:

      • data migration guidelines and any agency data management agreements, if relevant.

      Agencies can optimise data fusion and integration processes by automating scheduling and data integration tasks and by deploying intuitive interfaces to diagnose and resolve errors.

    • Criterion 66: Identify data sharing arrangements and processes to maintain consistency.

      Data sharing considerations include:

      • whether other systems could leverage the data analysed by the AI system
      • which areas within the agency would benefit from analysed data being shared with them
      • what data containers could improve with access to the system’s data sources
      • whether data on how the system was trained could be used to train other systems
      • documentation such as a memorandum of understanding, or similar, for data sharing arrangements intra-agency, inter-agency, or with external parties
      • addressing risks of creating personal identifiable information
      • what can be published for public, government, or internal benefit
      • any legislative implications.
         
  • Statement 19: Establish the model and context dataset

  • Statement 19: Establish the model and context dataset

  • Agencies must:

    • Criterion 67: Measure how representative the model dataset is.

      Key considerations for measuring and selecting a model dataset include:

      • whether it is representative of the true population relevant to the purpose of the AI system – this will improve model generalisation and minimise overfitting
      • ensuring the dataset has the required features, volumes, distribution, representation and demographics, including people with lived experience and intersectional dimensions. For example, someone with cultural or linguistic diversity, may also be a person with disability, the dataset must consider how multiple dimensions of a person intersect and create unique experiences or challenges
      • for GenAI, assess data quality thresholds and mechanisms in the data setup for modelling to help avoid unwanted bias and hallucinations.
    • Criterion 68: Separate the model training dataset from the validation and testing datasets.

      Agencies must maintain the separation between these datasets to avoid any misleading evaluation for trained models. 

      Agencies can refresh these datasets to account for timeframes, degradation in AI performance during operation, and compute resource constraints.

    • Criterion 69: Manage bias in the data.

      Techniques for agencies to manage and mitigate problematic bias in their model dataset includes:

      • data collection analysis – examining how data was generated and verified, and checking the methodologies used to ensure the data is diverse and represents the real population
      • data source analysis – investigating limitations and assumptions around the origin of the data
      • data diversity – determining various demographics, sources and types of data, inclusion and exclusion considerations
      • statistical testing – determining the likelihood of the population being accurately represented in the data
      • class imbalance – analysing data for class imbalance before using it to train classification models, and applying relevant data and algorithm techniques and metrics, such as precision or F1-score, to address this
      • outlier detection – identifying outliers or unusual data points in the data and ensuring they are handled appropriately
      • exploratory data analysis – using descriptive statistics and data visualisation tools to identify patterns and discrepancies
      • removing any irrelevant data from the training data that does not improve the performance of the model
      • ensuring that any sensitive and protected data are retained in the test datasets for the purpose of evaluating for bias
      • data augmentation – deploying measures to address the completeness of the model dataset, through supplementary data collection or synthetic data generation
      • transparency – identifying bias and where it originated from through transparency on data sourcing and processing
      • domain knowledge – ensuring practitioners have relevant domain knowledge on the datasets the AI system uses to serve the scope of the AI, including an understanding of the data characteristics and what it represents for the organisation
      • documentation of data use – documenting the use of data by the AI system and any potential change of use, providing an audit trail of any incidence and causation of bias.

    Agencies should

    • Criterion 70: For Generative AI, build reference or contextual datasets to improve the quality of AI outputs.

      A reference or a contextual dataset for GenAI, can be in the form of (and not limited to) a retrieval-augmented generation (RAG) dataset or a prompt dataset.

      Key considerations include:

      • building high-quality reference or contextual datasets to support more accurate and context aware AI outputs, and reduce hallucinations
      • implementing pre-defined prompts tailored to ensure consistent and reliable responses from GenAI models
      • establish workflows for prompt engineering and data preparation to streamline development and deployment of GenAI systems.

         
  • Statement 20: Plan the model architecture

  • Statement 20: Plan the model architecture

  • Agencies must

    • Criterion 71: Establish success criteria that covers any AI training and operational limitations for infrastructure and costs.

      Ensure alignment with AI system metrics selected at the design stage.

      Consider:

      • AI system purpose and requirements including explainability
      • pre-defined AI system metrics including AI performance metrics
      • impact and treatment for false positives and false negatives
      • AI operational environment including scalability intentions
      • frequency of change in context
      • limitations on compute infrastructure
      • cost constraints
      • operational models such as ModelOps, MLOps, LLMOps, DataOps, and DevOps (see Statement 1 ).

      AI training can occur in offline mode, or in online or real-time mode. This is dependent on the business case and the maturity of the data and infrastructure architecture. The risk of the model becoming stale is higher in offline mode, while the risk of the model exhibiting unverified behaviour is higher in online mode.

      The training process is interdependent to the infrastructure in the training environment. Complex model architectures with highly specialised learning strategies and large model datasets generally require tailored infrastructure to manage costs.

    • Criterion 72: Define a model architecture for the use case suitable to the data and AI system operation.

      The following will influence the choice of the model architecture and algorithms:

      • business requirements – risk thresholds or performance criteria
      • purpose of the system – identified stakeholders and the intended outcomes, safety, reproducibility level of AI model outputs, or explainability level for AI outputs
      • data – bias, quality, and managing the supply of data to the system
      • supporting infrastructure – computational demands, costs, and speed with respect to business needs
      • resourcing – the capabilities involved with documentation, oversight, and intervention in training the AI model or reusable assets
      • design – the training process will include necessary human oversight and intervention, to ensure responsible AI practices are in place. Consider embedding flexible architecture practices to avoid vendor lock-in.

      The model architecture will highlight the variables that will impact the intended outcomes for the system. These variables will include the model dataset, use case application and scalability intentions. These variables will influence which algorithms and learning strategies are chosen to train the AI model. 

      An AI scientist can test and analyse the model architecture and dataset to identify what is needed to effectively train the system. Additionally, they can outline requirements for the model architecture to comply with data, privacy, and ethical expectations.

      Consider starting off with simple and small architectures, and add complexity progressively depending on the purpose of the system to simplify debugging and reduce errors. Note that an AI system can contain a combination of multiple models which can add to the complexity.

      Generally, a single type of algorithm and training process may not be sufficient to determine optimal models for the AI system. It is usually good practice to train multiple models with various algorithms and training methodologies.

      There are options to develop a chain of AI models, or add more complexity, if that better meets the intent of the AI system. Each model could use a different type of algorithm and training process.

      Analysis of support and maintenance of the AI system in operation can influence the model architecture. For some use cases, a complete model refresh may be required, noting cost considerations. Alternatives such as updates to pre or post processing could be considered, including updates to the configuration or knowledge repository for RAG for GenAI. 

      It may not be necessary to retrain models every time new information becomes available, and this should be considered when defining the model architecture. For example, for GenAI, adding new information in RAG can help the AI system remain up to date without the need to retrain the AI model, saving on costs without impacting AI accuracy.

    • Criterion 73: Select algorithms aligned with the purpose of the AI system and the available data.

      There are various forms of algorithms to train an AI model, and it is important to select them based on the AI system requirements, model success criteria, and the available model dataset. A learning strategy is a method to train an AI model and dictates the mathematical computations that will be required during the training process.

      Depending on use case, some examples of the types of training processes may include:

      • supervised learning – training an AI model with a dataset, made up of observations, that has desired outputs or labels, such as support vector machines or tree-based models
      • unsupervised learning – training a model to learn patterns in the dataset itself, where the training dataset does not have desired outputs or labels, such as anomaly detection or transformer LLMs
      • reinforcement learning – training a model to maximise pre-defined goals, such as Monte Carlo tree search or fine-tuning models
      • transfer learning – a model trained on one task, such as a pre-trained model, is reused as a starting point to enhance model performance on a related, yet different, task
      • parameter tuning – optimising a model’s performance by adjusting parameters or hyperparameters of a model, usually adjusted automatically
      • model retraining – updating a model with new data
      • online or real-time mode – continuously train the model using live data (note that this can significantly increase vulnerability of the AI system, such as data poisoning attacks).

      Like traditional software, there are options to reuse, reconfigure, buy, or build models. An agency could reuse off-the-shelf models as-is, fine-tune pre-trained models, use pre-built algorithms, or create new models. The approach taken to training will vary across model types.

    • Criterion 74: Set training boundaries in relation to any infrastructure, performance, and cost limitations.

    Agencies should

    • Criterion 75: Start small, scale gradually.

      Consider:

      • starting off with simple and small architectures and add complexity progressively, depending on the purpose of the system to simplify debugging and reduce errors
      • that an AI system can contain a combination of multiple models which can add to the complexity.
         
  • Statement 21: Establish training environment

  • Statement 21: Establish the training environment

  • Agencies must

    Agencies should

    • Criterion 78: Reuse approved AI modelling frameworks, libraries, and tools.

  • Statement 22: Implement model creation, tuning, and grounding

  • Statement 22: Implement model creation, tuning, and grounding

  • Agencies must

    • Criterion 79: Set assessment criteria for the AI model, with respect to pre-defined metrics for the AI system.

      These criteria should address:

      • success factors specific to user stories
      • model quality thresholds and performance of the AI system
      • explainability and interpretability requirements
      • security and privacy requirements
      • ethics requirements
      • tolerance for error for model outputs
      • tolerance for negative impacts
      • error rates by scale and similar processing by humans.

      Considerations for modelling include:

      • model training, maintenance, and support costs
      • data and compute infrastructure constraints
      • likelihood of the AI models becoming outdated
      • whether the model can be legally used for the intended use case
      • whether methods can be implemented to mitigate risk of new harms being introduced into the AI system
      • bias, security, and ethical concerns
      • whether the model meets the explainability and interpretability requirements
      • use of model interpretability tools to analyse important features and decision logic.
    • Criterion 80: Identify and address situations when AI outputs should not be provided.

      These situations include:

      • low confidence scores
      • when user input and context are ambiguous or lack reliable sources
      • complex questions as input
      • limited knowledge base
      • privacy concerns and potential breach of safety
      • harmful content
      • unlawful content
      • misleading content.

      For GenAI, implementing techniques such as threshold settings or content filtering could address these situations.

    • Criterion 81: Apply considerations for reusing existing agency models, off-the-shelf, and pre-trained models.

      These include:

      • whether the model can be adapted to meet the KPIs for the AI system
      • suitability of pre-defined AI architecture
      • availability of AI specialist skills or skills required for configuration and integration
      • whether the model is relevant to the target operating domain or can be adapted to it, such as fine-tuning, retrieval-augmented generation (RAG), and pre-processing and post-processing techniques
      • cybersecurity assessment in line with Australian Government policies and guidance (see Whole of AI Lifecycle for more details).
    • Criterion 82: Create or fine-tune models optimised for target domain environment.

      This includes:

      • model testing on target operating environment and infrastructure
      • using pre-processing and post-processing techniques
      • addressing input and output filtering requirements for safety and reliability
      • grounding such as RAG, which can augment a large language model (LLM) with trusted data from a database or knowledge base internal to an agency
      • for GenAI, prompt engineering or establishing a prompt library, which can streamline and improve interactions with an AI model
      • consider cost and performance implications associated with the adaptation techniques
      • perform unit testing for the training algorithm, pre-processing, and post-processing algorithms
      • track model training implementations systematically to speed up the discovery and development of models. 

    Agencies should

    • Criterion 83: Create and train using multiple model architectures and learning strategies.

      Systematically track model training implementations to speed up the discovery and development of models. This will help select a more optimal trained model.
       

  • Statement 23: Validate, assess, and update model

  • Statement 23: Validate, assess, and update model

  • Agencies must

    • Criterion 84: Set techniques to validate AI trained models.

      There are multiple qualitative and quantitative techniques and tools for model validation, informed by the AI system success criteria (see Design  section), including:

      • correct classifications, predictions or forecasts, and factual correctness and relevance
      • identify between positive and negative instances, and distinguish between classes
      • benchmarking
      • consistency in responses, clarity and coherence
      • source attribution
      • data-centric validation approaches for GenAI models.
    • Criterion 85: Evaluate the model against training boundaries.

      Evaluation considerations include: 

      • poor or degraded performance of the model
      • change of AI context or operational setting
      • data retention policies
      • model retention policies.
    • Criterion 86: Evaluate the model for bias, implement and test bias mitigations.

      This includes:

      • using suitable tools that test and discover unwarranted associations between an algorithm’s protected input features and its output
      • evaluating performance across suitable and intersectional dimensions
      • checking if bias could be managed through updating the training data (see Statement 18)
      • implementing bias mitigation thresholds that can be configured post-deployment
      • implementing pre-processing or post-processing techniques such as disparate impact remover, equalised odds post-processing, content filtering, and RAG.

    Agencies should

    • Criterion 87: Identify relevant model refinement methods.

      These considerations may trigger model refinement or retirement and can include:

      • model parameter or weight adjustments – further training or re-training the model on a new set of observations, or additional training data
      • adjusting data pre-processing or post-processing components
      • model pruning – to reduce redundant mathematical calculations and speed up operations.
         
  • Statement 24: Select trained models

  • Statement 24: Select trained models

  • Agencies should

    • Criterion 88: Assess a pool of trained models against acceptance metrics to select a model for the AI system.

      This involves:

    • defining clear needs and expectations
    • comparing multiple trained models, usually generated based on different configurations
    • prioritising based on metrics such as ‘simplest’ or ‘most effective’
    • documenting the rationale for selection based on results from training models with various model architectures, learning strategies, and configurations
    • any risk and mitigation plans
    • a model refresh and re-training plan and register
    • implementing mechanisms for explainability of model outputs to system users
    • feedback channels and mechanisms implemented for monitoring and managing model performance
    • an audit plan
    • documenting a method for retiring the model.
       

Connect with the digital community

Share, build or learn digital experience and skills with training and events, and collaborate with peers across government.