Statement 17: Validate and select data
Agencies must
Criterion 62: Perform data validation activities to ensure data meets the requirements for the AI system’s purpose.
This involves including AI-specific validations in schema migrations to ensure data pipelines and feature stores remain functional. Suitable data validation techniques include:
- type validation – ensuring data is in the correct data type
- format validation – ensuring data aligns to a predefined pattern
- range validation – checking whether data falls within a specific range
- outlier detection – checking for data points that significantly deviate from the general data pattern
- completeness – verifying that all required fields are filled
- diversity – ensuring the data represents a variety of data points
Considerations include:
- a quality framework
- online near real-time and offline batch data validation mechanisms to support the purpose and operations of the AI system.
Criterion 63: Select data for use that is aligned with the purpose of the AI system.
This includes:
- alignment with the agency’s business intent and the goals of the AI system, as well as ensuring data meets the data quality criteria previously established
- maintaining a live test dataset to test the AI system in production, to help monitor and maintain the operational integrity of the AI system.