Statement 18: Enable data fusion, integration and sharing
Agencies should:
Criterion 64: Analyse data fusion and integration requirements.
This includes:
- datasets, their sources and their owners
- purpose of the datasets for the AI system and intended outcomes
- data interdependencies
- risks associated with the datasets and mitigation plans
- data fusion and integration methodology for the AI system
- metrics to assess the quality of the fusion and data integration process and its outputs
- security, storage, and access requirements
- scalability intentions
- documentation and traceability
- regular audits and reviews
- data sharing principles and the risk management framework data as per the Data Availability and Transparency Act 2022 (DATA Scheme)
- compliance with the Guidelines on data matching in Australian Government administration guidelines
- ethical considerations and guidance on data use as per the Data Ethics Framework | Department of Finance.
Data fusion is a method to integrate or combine data from multiple sources and this can help an AI system create a more comprehensive, reliable, and accurate output. Meaningful data sharing practices across the agency can build interoperability between systems and datasets. Data sharing also promotes reuse, reducing resources for collection and analysis.
Criterion 65: Establish an approach to data fusion and integration.
This approach should involve one or more of the following processes:
- ETL (Extract, Transform and Load) – batch movements of data
- ELT (Extract, Load and Transform) – batch movements of data
- Application programming interface (API) – allowing the movement and syncing of data across multiple applications
- data streaming – moving data in or near real-time from source to target
- data virtualisation – combining streaming data virtually from different sources on demand
- chaining of AI models – linking multiple AI models in a sequence where the output from one model becomes the input for another.
Consider:
- data migration guidelines and any agency data management agreements, if relevant.
Agencies can optimise data fusion and integration processes by automating scheduling and data integration tasks and by deploying intuitive interfaces to diagnose and resolve errors.
Criterion 66: Identify data sharing arrangements and processes to maintain consistency.
Data sharing considerations include:
- whether other systems could leverage the data analysed by the AI system
- which areas within the agency would benefit from analysed data being shared with them
- what data containers could improve with access to the system’s data sources
- whether data on how the system was trained could be used to train other systems
- documentation such as a memorandum of understanding, or similar, for data sharing arrangements intra-agency, inter-agency, or with external parties
- addressing risks of creating personal identifiable information
- what can be published for public, government, or internal benefit
- any legislative implications.