Statement 17: Validate and select data

Agencies must

  • Criterion 62: Perform data validation activities to ensure data meets the requirements for the AI system’s purpose.

    This involves including AI-specific validations in schema migrations to ensure data pipelines and feature stores remain functional. Suitable data validation techniques include:

    • type validation – ensuring data is in the correct data type
    • format validation – ensuring data aligns to a predefined pattern
    • range validation – checking whether data falls within a specific range
    • outlier detection – checking for data points that significantly deviate from the general data pattern
    • completeness – verifying that all required fields are filled
    • diversity – ensuring the data represents a variety of data points

    Considerations include:

    • a quality framework
    • online near real-time and offline batch data validation mechanisms to support the purpose and operations of the AI system.
  • Criterion 63: Select data for use that is aligned with the purpose of the AI system.

    This includes:

    • alignment with the agency’s business intent and the goals of the AI system, as well as ensuring data meets the data quality criteria previously established
    • maintaining a live test dataset to test the AI system in production, to help monitor and maintain the operational integrity of the AI system.
       

Statement 18: Enable data fusion, integration and sharing

Connect with the digital community

Share, build or learn digital experience and skills with training and events, and collaborate with peers across government.