• Statement 12: Define success criteria

  • Agencies must

    • Criterion 41: Identify, assess, and select metrics appropriate to the AI system.

      Relying on a single metric could lead to false confidence, while tracking irrelevant metrics could lead to false incidents. To mitigate these risks, analyse the capabilities and limitations of each metric, select multiple complementary metrics, and implement methods to test assumptions and to find missing information.

      Considerations for metrics includes:

      • value-proposition metrics – benefits realisation, social outcomes, financial measures, or productivity measures
      • performance metrics – precision and recall for classification models, mean absolute error for regression models, or bilingual evaluation understudy (BLEU) for text generation. This can include summarisation tasks, inception score for image generation models, or mean opinion score for audio generation
      • training data metrics – data diversity and data quality related measures
      • bias-related metrics – demographic parity to measure group fairness, fairness through awareness to measure individual fairness, counterfactual fairness to measure causality-based fairness
      • safety metrics – likelihood of harmful outputs, adversarial robustness, or potential data leakage measures
      • reliability metrics – availability, latency, mean time between failures (MTBF), mean time to failure (MTTF), or response time
      • citation metrics – measures related to proper acknowledgement and references to direct content and specialised ideas
      • adoption-related metrics – adoption rate, frequency of use, daily active users, session length, abandonment rate, or sentiment analysis
      • human-machine teaming metrics – total time or effort taken to complete a task, reaction time when human control is needed, or number of times human intervention is needed
      • qualitative measures – checking the well-being of the humans operating or using the AI system, or interviewing participants and observing them while using the AI system to identify usability issues
      • drift in AI system inputs and outputs - changes in input distribution, outputs, and performance over time.
         
    • After metrics have been identified, understand and assess the trade-offs between the metrics.

      This includes:

      • assessing trade-offs between different success criteria
      • determining the possible harms with incorrect output, such as a false positive or false negative
      • analysing how the output of the AI system could be used. For example, determine which instance would have greater consequences: a false negative that would fail to detect a cyberattack; or a false positive that incorrectly flags a legitimate user as a threat
      • assessing the trade-offs among the performance metrics
      • understanding the trade-offs with costs, explainability, reliability, and safety
      • understanding the limitations of the selected metric and ensure measures are considered when building the AI system, such as selecting data and training methods
      • trade-offs are documented, understood by stakeholders, and accounted for in selecting AI models and systems
      • optimising the metrics appropriate to the use case.

    Agencies should

    • Criterion 42: Reevaluate the selection of appropriate success metrics as the AI system moves through the AI lifecycle.

    • Criterion 43: Continuously verify correctness of the metrics.

      Before relying on the metrics, verify the following:

      • metrics are accurately reflected when the AI system does not have enough information
      • metrics correctly reflect errors, failures, and successful task performance.
  • Statement 13: Establish data supply chain management processes

  • Statement 13: Establish data supply chain management processes

  • Agencies must

    • Criterion 44: Create and collect data for the AI system and identify the purpose for its use.

      It is important to identify:

      • what data will be used and is fit-for-purpose for the AI system
      • the sensitivity of the data, such as personal, protected, or otherwise sensitive
      • consent provided on usage including when to retain or destroy data, ensuring the proposed uses in the AI system align with the original limits of the consent
      • speed and mode of the data supply
      • how the data will be used at each stage of the AI system
      • where the data will be stored at each stage of the AI system
      • changes to the data at different points of the AI system
      • methods to manage and monitor data access
      • methods to manage any real-time data changes
      • data retention policies
      • cross-agency or cross-border data governance, if relevant
      • any risks and challenges associated with data elements of off-the-shelf AI models, products, or services in the AI system
      • cyber supply chain management
      • data quality monitoring and remediation
      • comprehensive documentation at each stage of the AI system to facilitate traceability and accountability
      • adherence to relevant legislation.
      • The consent framework for use of data across the AI system should satisfy the following:
        • clear framework
        • kept up to date
        • individuals are provided with informed consent for how their data will be used
        • a dedicated team to own and maintain a register on how data is being used and to show compliance with the terms of the consent
      • The data should be thought of in groupings or packages, including:
        • the data within the organisation
        • the data surrounding the algorithm, APIs, and user interface
        • the data used to train the AI system
        • the data used for testing and integration
        • data inputted at regular intervals in monitoring the data
        • the data used at deployment, including input and output data from and to users.
    • Criterion 45: Plan for data archival and destruction.

      Consider the following: 

      • will data be made available for future use, and what data
      • restrictions and access controls in place
      • will data be restricted until a specific date
      • file formats to ensure data remains available during the archival period
      • alignment with data sharing arrangements
      • arrangements for data used to train and test AI models, and associated model management arrangements
      • clear criteria for data archival and destruction for the data used at each stage of the AI lifecycle
      • guidelines in the Information management for records created using Artificial Intelligence (AI) technologies | naa.gov.au.

    Agencies should:

    • Criterion 46: Analyse data for use by mapping the data supply chain and ensuring traceability.

      Mapping the data supply chain to the AI system involves capturing how data will be stored, shared, and processed, particularly at the training and testing stages, which involve regular injections of data. When mapping the data account for:

      • how data was sourced
      • what data is required by the system, ensuring that excess data or data irrelevant to the functioning of system is not consumed by the system
      • the amount and type of data the system will use
      • what could affect the reliable accessibility of data
      • how data will be fused and transformed
      • how will the data be secured at rest and in transit
      • how will the data be used by the system.

      Ensuring traceability entails maintaining awareness of the flow of data across the AI system.

      This includes:

      • data sovereignty controls and considerations including legal implications for geographic locations for data (including its metadata and logs) when at rest, in transit, or in use. For classified data processing on cloud platforms, it is recommended to use cloud service providers and cloud services located in Australia, as per Cloud assessment and authorisation | Cyber.gov.au
      • providing the level of detail for debugging data errors and troubleshooting
      • enforcing organisational policies on information management
      • enhancing visibility over changes to the data occurring during migrations, system updates, or other errors
      • supporting users to identify and fix data issues with a clear information audit trail
      • supporting diagnosis for bias
      • managing the quality of data to maintain availability and consistency. 
    • Criterion 47: Implement practices to maintain and reuse data.

      This involves determining ongoing mechanisms for ensuring data is protected, accessible, and available for use in line with the original consent parameters.

      Any changes in data scope, including expansion in scope and usage patterns, would need to be monitored and addressed.

  • Statement 14: Implement data orchestration processes

  • Statement 14: Implement data orchestration processes

  • Agencies must

    • Criterion 48: Implement processes to enable data access and retrieval, encompassing the sharing, archiving, and deletion of data.

      Considerations include:

    Agencies should

    • Criterion 49: Establish standard operating procedures for data orchestration.

      This includes:

      • defining responsibilities between business areas and identifying mutual outcomes to be managed across teams. This is particularly important for business areas that are owners of datasets
      • considering inclusion of infrastructure arrangements and use of cloud arrangements for data storage or processing.

      Practices to be defined include:

      • data governance
      • data testing
      • security and access controls.
    • Criterion 50: Configure integration processes to integrate data in increments.

      This includes:

      • enabling agencies to better manage incident identification and intervention during data integration
      • ensuring risks of creating personal identifiable information from data integration are managed appropriately.
    • Criterion 51: Implement automation processes to orchestrate the reliable flow of data between systems and platforms.

    • Criterion 52: Perform oversight and regular testing of task dependencies.

      This should involve having comprehensive backup plans in place to handle potential outages or incidents.

      The following should be considered:

      • regular backups of critical data
      • failover mechanisms
      • detailed recovery procedures to minimise downtime and data loss.
    • Criterion 53: Establish and maintain data exchange processes.

      This includes:

      • how often will data need to be accessed by the system
      • at what points will the frequency, magnitude, or speed of access change
      • how will security processes adapt when data is exposed to new risks across the AI system
      • how will data be monitored for changes to accessibility or completeness
      • will the sensitivity of the data change once processed or analysed
      • how to validate data trust and authenticity.
  • Statement 15: Implement data transformation and feature engineering practices

  • Statement 15: Implement data transformation and feature engineering practices

  • Agencies should

    • Criterion 54: Establish data cleaning procedures to manage any data issues.

      Data cleaning involves appropriately treating data errors, inconsistencies, or missing values to improve performance of the AI system. Data cleaning should be documented, and possibly included in the metadata, each time it is conducted to manage issues such as:

      • blanks, nulls, or trailing spaces
      • structural errors or unwanted formatting
      • missing data
      • spelling mistakes
      • repetition of words
      • irrelevant characters
      • content or observations irrelevant to the purpose of the AI system.

      For open-source data, or data that has not yet been validated or can be trusted, consider using a sandbox environment.

    • Criterion 55: Define data transformation processes to convert and optimise data for the AI system.

      This could leverage existing Extract, Transform and Load (ETL) or Extract, Load and Transform (ELT) processes.

      Consider the following data transformation techniques:

      • data standardisation – convert data from various sources into a consistent format
      • data reorganisation – organise data to make it easier to query and analyse
      • data integration – combine data from different sources for a single unified view
      • discretisation – convert continuous data into discrete intervals
      • missing value imputation – analyse what values need to be imputed and the method
      • convert data from one source to another, such as log transformation
      • smoothing – to even out fluctuations
      • convert unstructured data to structured data
      • Optical Character Recognition (OCR) – convert images of text into machine readable format
      • object labelling and tracking – in images, audio, and video
      • signal processing and transformation
      • point in time of data – a snapshot of data at a specific point in time.
    • Criterion 56: Map the points where transformation occurs between datasets and across the AI system.

      Consider:

      • security checks.
    • Criterion 57: Identify fit-for-purpose feature engineering techniques.

      Feature engineering techniques include:

      • feature creation and extraction – deriving features from existing data to help the AI system produce better quality outputs
      • feature selection – selecting attributes or fields that provide relevant context to the AI model
      • encoding – converting data into a format that can be better used in AI algorithms
      • binning – grouping data into categories
      • specific conversion – changing data from one format to another for AI compatibility
      • scaling – mapping all data to a specific range to help improve AI outputs.
    • Criterion 58: Apply consistent data transformation and feature engineering methods to support data reuse and extensibility.

      Consider:

      • metadata and tagging of the data
      • data transformation not limited to AI models and processes.
         
  • Statement 16: Ensure data quality is acceptable

  • Statement 16: Ensure data quality is acceptable

  • Agencies must

    • Criterion 59: Define quality assessment criteria for the data used in the AI system.

      Data quality can be measured across a variety of dimensions (in line with the ABS Data Quality Framework) by identifying institutional environment, relevance, timeliness, accuracy, coherence, interpretability, and accessibility.

      A report on data quality can include:

      • data quality statement (see ABS Data Quality Statement Checklist)
      • metrics for measuring data quality, including its correctness and credibility
      • frequency of reporting on data quality
      • delegating ownership to a business area to be responsible for managing data quality
      • monitoring for changes in quality across the supply chain
      • intervening and addressing data quality issues as they arise

      Consider:

      • any existing data standard frameworks that are used by the agency. 

    Agencies should:

    • Criterion 60: Implement data profiling activities and remediate any data quality issues.

      This involves analysing the structure, content, and quality of the data to determine its fitness for purpose for an AI system. 

      Data profiling can investigate the following characteristics:

      • frequency
      • volume, range, and distribution
      • invalid entry identification
      • error detection
      • duplicates identification
      • noise identification
      • specific pattern identification.
    • Methods that can be used to explore and analyse the data include:
      • descriptive statistics, such as mean, median, mode, or frequencies
      • business rules – apply business knowledge
      • clustering or dendrogram – group similar observations together
      • visualisation – to get a visual representation of the data from various types of graphs and charts, such as histograms, bar-plot, boxplots, density plots, or heatmaps
      • correlation analysis – measure relationships between variables, usually between numerical variables
      • scatter plots – visualise relationships between two numerical variables
      • cross-tabulations – analyse relationships between multiple categorical variables
      • principal component analysis – analyse variables with the most variance
      • factor analysis – helps reveal hidden patterns.
    • Criterion 61: Define processes for labelling data and managing the quality of data labels.

      Data labelling can be done for the purposes of managing and storing data, audit purposes and AI model training purposes. Humans with appropriate skills and knowledge can perform the data labelling or it could be supported by automated labelling tools. 

      Setting data labelling practices can help optimise performance across the AI system by describing the context, categories, and relationships between data types, creating lineage of data through the AI system via versioning, distinguishing between pre-deployment and live data, and identifying what data will be reused, archived, or destroyed.

      These include:

      • establishing naming schemes, taxonomy, tagging, and data labelling practices
      • considering different techniques such as manual or automated labelling, crowdsourcing, and quality checks
      • defining quality control methods to improve consistency of labelling and assist in reducing bias
      • considering changes to the raw data and data imputations, and associated impact
      • providing data labels for AI training approaches or testing the AI models. Labels can provide the ground truth data for AI models and can influence AI validation. Different types of data labelling include:
        • classification
        • regression
        • visual object labels
        • audio labels
        • entity tagging.
      • applying quality assurance measures to data labels, labelling personnel, and automated data-labelling support tools
      • implementing bias mitigation practices in labelling:
        • establishing a review process. Diverse people could independently label the same data so correlation could be analysed. Final labels could go through spot check review by subject matter experts
        • establishing feedback loops. Labellers should be able to report issues and suggest improvements, and automated systems should be updated to be consistent with corrections made by human labellers
        • establishing performance management for staff. Data labellers should undergo periodic training, performance reviews, and random audits for quality control
        • implementing metadata labelling techniques that capture the type of data categories within the system and the relationship between these categories. Metadata labels can be prepared for model bias evaluation by annotating metadata with suitable dimensions. Ensure the metadata labelling aligns to the Guide on Metadata Attributes | Office of the National Data Commissioner and Australian Government Recordkeeping Metadata Standard | naa.gov.au.
      • assessing and monitoring quality of all automated data labelling support tools. Determine the regularity and criteria for these quality checks and report on findings
      • updating and maintaining the labelling tools and processes to adapt to new data types, and labelling requirements
      • considering potential harm to data labellers who may need to access sensitive or distressing content. This can occur when training an AI model to prevent responses including violence, hate speech, or sexual abuse.
         
  • Statement 17: Validate and select data

  • Statement 17: Validate and select data

  • Agencies must

    • Criterion 62: Perform data validation activities to ensure data meets the requirements for the AI system’s purpose.

      This involves including AI-specific validations in schema migrations to ensure data pipelines and feature stores remain functional. Suitable data validation techniques include:

      • type validation – ensuring data is in the correct data type
      • format validation – ensuring data aligns to a predefined pattern
      • range validation – checking whether data falls within a specific range
      • outlier detection – checking for data points that significantly deviate from the general data pattern
      • completeness – verifying that all required fields are filled
      • diversity – ensuring the data represents a variety of data points

      Considerations include:

      • a quality framework
      • online near real-time and offline batch data validation mechanisms to support the purpose and operations of the AI system.
    • Criterion 63: Select data for use that is aligned with the purpose of the AI system.

      This includes:

      • alignment with the agency’s business intent and the goals of the AI system, as well as ensuring data meets the data quality criteria previously established
      • maintaining a live test dataset to test the AI system in production, to help monitor and maintain the operational integrity of the AI system.
         
  • Statement 18: Enable data fusion, integration and sharing

  • Statement 18: Enable data fusion, integration and sharing

  • Agencies should:

    • Criterion 64: Analyse data fusion and integration requirements.

      This includes:

      Data fusion is a method to integrate or combine data from multiple sources and this can help an AI system create a more comprehensive, reliable, and accurate output. Meaningful data sharing practices across the agency can build interoperability between systems and datasets. Data sharing also promotes reuse, reducing resources for collection and analysis.

    • Criterion 65: Establish an approach to data fusion and integration.

      This approach should involve one or more of the following processes: 

      • ETL (Extract, Transform and Load) – batch movements of data
      • ELT (Extract, Load and Transform) – batch movements of data
      • Application programming interface (API) – allowing the movement and syncing of data across multiple applications
      • data streaming – moving data in or near real-time from source to target
      • data virtualisation – combining streaming data virtually from different sources on demand
      • chaining of AI models – linking multiple AI models in a sequence where the output from one model becomes the input for another.

      Consider:

      • data migration guidelines and any agency data management agreements, if relevant.

      Agencies can optimise data fusion and integration processes by automating scheduling and data integration tasks and by deploying intuitive interfaces to diagnose and resolve errors.

    • Criterion 66: Identify data sharing arrangements and processes to maintain consistency.

      Data sharing considerations include:

      • whether other systems could leverage the data analysed by the AI system
      • which areas within the agency would benefit from analysed data being shared with them
      • what data containers could improve with access to the system’s data sources
      • whether data on how the system was trained could be used to train other systems
      • documentation such as a memorandum of understanding, or similar, for data sharing arrangements intra-agency, inter-agency, or with external parties
      • addressing risks of creating personal identifiable information
      • what can be published for public, government, or internal benefit
      • any legislative implications.
         
  • Statement 19: Establish the model and context dataset

  • Statement 19: Establish the model and context dataset

  • Agencies must:

    • Criterion 67: Measure how representative the model dataset is.

      Key considerations for measuring and selecting a model dataset include:

      • whether it is representative of the true population relevant to the purpose of the AI system – this will improve model generalisation and minimise overfitting
      • ensuring the dataset has the required features, volumes, distribution, representation and demographics, including people with lived experience and intersectional dimensions. For example, someone with cultural or linguistic diversity, may also be a person with disability, the dataset must consider how multiple dimensions of a person intersect and create unique experiences or challenges
      • for GenAI, assess data quality thresholds and mechanisms in the data setup for modelling to help avoid unwanted bias and hallucinations.
    • Criterion 68: Separate the model training dataset from the validation and testing datasets.

      Agencies must maintain the separation between these datasets to avoid any misleading evaluation for trained models. 

      Agencies can refresh these datasets to account for timeframes, degradation in AI performance during operation, and compute resource constraints.

    • Criterion 69: Manage bias in the data.

      Techniques for agencies to manage and mitigate problematic bias in their model dataset includes:

      • data collection analysis – examining how data was generated and verified, and checking the methodologies used to ensure the data is diverse and represents the real population
      • data source analysis – investigating limitations and assumptions around the origin of the data
      • data diversity – determining various demographics, sources and types of data, inclusion and exclusion considerations
      • statistical testing – determining the likelihood of the population being accurately represented in the data
      • class imbalance – analysing data for class imbalance before using it to train classification models, and applying relevant data and algorithm techniques and metrics, such as precision or F1-score, to address this
      • outlier detection – identifying outliers or unusual data points in the data and ensuring they are handled appropriately
      • exploratory data analysis – using descriptive statistics and data visualisation tools to identify patterns and discrepancies
      • removing any irrelevant data from the training data that does not improve the performance of the model
      • ensuring that any sensitive and protected data are retained in the test datasets for the purpose of evaluating for bias
      • data augmentation – deploying measures to address the completeness of the model dataset, through supplementary data collection or synthetic data generation
      • transparency – identifying bias and where it originated from through transparency on data sourcing and processing
      • domain knowledge – ensuring practitioners have relevant domain knowledge on the datasets the AI system uses to serve the scope of the AI, including an understanding of the data characteristics and what it represents for the organisation
      • documentation of data use – documenting the use of data by the AI system and any potential change of use, providing an audit trail of any incidence and causation of bias.

    Agencies should

    • Criterion 70: For Generative AI, build reference or contextual datasets to improve the quality of AI outputs.

      A reference or a contextual dataset for GenAI, can be in the form of (and not limited to) a retrieval-augmented generation (RAG) dataset or a prompt dataset.

      Key considerations include:

      • building high-quality reference or contextual datasets to support more accurate and context aware AI outputs, and reduce hallucinations
      • implementing pre-defined prompts tailored to ensure consistent and reliable responses from GenAI models
      • establish workflows for prompt engineering and data preparation to streamline development and deployment of GenAI systems.

         
  • Statement 20: Plan the model architecture

Connect with the digital community

Share, build or learn digital experience and skills with training and events, and collaborate with peers across government.