Version control is a process that tracks and manages changes to information such as data, models, and system code. This allows business and technical stakeholders to identify the state of an AI system when decisions are made, restore previous versions, and restore deleted or overwritten files.
AI system versioning can extend beyond traditional coding practices, which manages a package of identifiable code or configuration information. Version control for information such as training data, models, and hyperparameters will need to be considered.
Information across the AI lifecycle, that was used to generate a decision or outcome, must be captured. This applies to all AI products, including low code or no code third-party tools.
Agencies must
Criterion 18: Apply version management practices to the end-to-end development lifecycle.
Australian Government API guidelines mandate the use of semantic versioning. They should be enhanced to cater for AI related information and processes.
Version standards should clearly document the difference between production and non-production data, models and code.
This involves applying version management practices to:
- the model, training and operation dataset, data in the AI system, training algorithm, and hyperparameters
- maintaining design documentation outlining the end-to-end AI system state in line with existing organisational control mechanisms
- include point-in-time date and timestamps to data and any changes in data
- authorship, relevant licencing details, and changes since last version
- capturing approvals from accountable officials for workflow and model reviews, datasets used for training, and relevant hyperparameters
- managing any data poisoning and AI poisoning
- data versioning supporting AI interoperability should include the following:
- consistency: data structures, exchanges, and formats across different sources are well-defined
- integration: enables data sourced from different sources to be integrated in a seamless manner
- all documents relating to the establishment, design, and governance of an AI implemented solution must be retained as per the Archives Act 1983.
- This does not apply to:
- third-party software products, which are subject to existing controls.
Agencies should
Criterion 19: Use metadata in version control to distinguish between production and non-production data, models, and code.
This includes:
- a simple and transparent way for all users of the system to understand the version of each component at the time a decision was made
- the use of tags in the version number to provide a visual representation of non-production versions without needing direct access to data or source control toolsets
- the use of metadata can also distinguish between different control states where outputs can vary, and core system functionality of the system has not changed.
Criterion 20: Use a version control toolset to improve useability for users.
Version toolsets improve the usability for service delivery and business users, addressing activities such as appeals, Ministerial correspondence, executive briefs, court cases, audit, assurance, privacy, and legislative reviews
This includes:
- using purpose built in-house or commercial version management products
- storing sufficient information to allow rollback to a previous system state
- considering archival requirements of training data used in a test environment.
Criterion 21: Record version control information in audit logs.
This includes:
- use of a commit hash to identify the control state of all elements, to reduce the volume and complexity of audit log data
- recording AI predictions and actions taken
- pro-active data analytics to be processed against the audit logs, to monitor and assess ongoing AI system performance
- where low code or no code third-party tools are used.