Statement 13: Establish data supply chain management processes
Agencies must
Criterion 44: Create and collect data for the AI system and identify the purpose for its use.
It is important to identify:
- what data will be used and is fit-for-purpose for the AI system
- the sensitivity of the data, such as personal, protected, or otherwise sensitive
- consent provided on usage including when to retain or destroy data, ensuring the proposed uses in the AI system align with the original limits of the consent
- speed and mode of the data supply
- how the data will be used at each stage of the AI system
- where the data will be stored at each stage of the AI system
- changes to the data at different points of the AI system
- methods to manage and monitor data access
- methods to manage any real-time data changes
- data retention policies
- cross-agency or cross-border data governance, if relevant
- any risks and challenges associated with data elements of off-the-shelf AI models, products, or services in the AI system
- cyber supply chain management
- data quality monitoring and remediation
- comprehensive documentation at each stage of the AI system to facilitate traceability and accountability
- adherence to relevant legislation.
- The consent framework for use of data across the AI system should satisfy the following:
- clear framework
- kept up to date
- individuals are provided with informed consent for how their data will be used
- a dedicated team to own and maintain a register on how data is being used and to show compliance with the terms of the consent
- The data should be thought of in groupings or packages, including:
- the data within the organisation
- the data surrounding the algorithm, APIs, and user interface
- the data used to train the AI system
- the data used for testing and integration
- data inputted at regular intervals in monitoring the data
- the data used at deployment, including input and output data from and to users.
Criterion 45: Plan for data archival and destruction.
Consider the following:
- will data be made available for future use, and what data
- restrictions and access controls in place
- will data be restricted until a specific date
- file formats to ensure data remains available during the archival period
- alignment with data sharing arrangements
- arrangements for data used to train and test AI models, and associated model management arrangements
- clear criteria for data archival and destruction for the data used at each stage of the AI lifecycle
- guidelines in the Information management for records created using Artificial Intelligence (AI) technologies | naa.gov.au.
Agencies should:
Criterion 45: Analyse data for use by mapping the data supply chain and ensuring traceability.
Mapping the data supply chain to the AI system involves capturing how data will be stored, shared, and processed, particularly at the training and testing stages, which involve regular injections of data. When mapping the data account for:
- how data was sourced
- what data is required by the system, ensuring that excess data or data irrelevant to the functioning of system is not consumed by the system
- the amount and type of data the system will use
- what could affect the reliable accessibility of data
- how data will be fused and transformed
- how will the data be secured at rest and in transit
- how will the data be used by the system.
Ensuring traceability entails maintaining awareness of the flow of data across the AI system.
This includes:
- data sovereignty controls and considerations including legal implications for geographic locations for data (including its metadata and logs) when at rest, in transit, or in use. For classified data processing on cloud platforms, it is recommended to use cloud service providers and cloud services located in Australia, as per Cloud assessment and authorisation | Cyber.gov.au
- providing the level of detail for debugging data errors and troubleshooting
- enforcing organisational policies on information management
- enhancing visibility over changes to the data occurring during migrations, system updates, or other errors
- supporting users to identify and fix data issues with a clear information audit trail
- supporting diagnosis for bias
- managing the quality of data to maintain availability and consistency.
Criterion 47: Implement practices to maintain and reuse data.
This involves determining ongoing mechanisms for ensuring data is protected, accessible, and available for use in line with the original consent parameters.
Any changes in data scope, including expansion in scope and usage patterns, would need to be monitored and addressed.