Technical standard for government’s use of artificial intelligence: key terms

A - E

AI incident

An event, circumstance or series of events where the development, use or malfunction of one or more AI systems directly or indirectly leads to any of the following harms:

injury or harm to the health of a person or groups of people
disruption of the management and operation of critical infrastructure
violations of human rights or a significant breach of obligations under applicable laws, including intellectual property, privacy and Indigenous cultural and intellectual property
harm to property, communities or the environment.

AI model

‘A model is defined as a “physical, mathematical or otherwise logical representation of a system, entity, phenomenon, process or data” in the ISO/IEC 22989 standard. AI models include, among others, statistical models and various kinds of input-output functions (such as decision trees and neural networks). An AI model can represent the transition dynamics of the environment, allowing an AI system to select actions by examining their possible consequences using the model. AI models can be built manually by human programmers or automatically through, for example, unsupervised, supervised, or reinforcement machine learning techniques.’ OECD definition.

AI system

‘An Artificial Intelligence (AI) system is a machine-based system that, for explicit or implicit objectives, infers, from the input it receives, how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments. Different AI systems vary in their levels of autonomy and adaptiveness after deployment.' OECD definition.

AI watermarking

Information embedded into digital content, either perceptibly or imperceptibly by humans, that can serve a variety of purposes, such as establishing digital content provenance or informing stakeholders that the contents are AI-generated or significantly modified.
AI-generated content watermarking: a procedure by which watermarks are embedded into AI-generated content. This embedding can occur at 2 distinct stages: during generation by altering a GenAI model's inference procedure or post-generation, as the content is distributed along the data and information distribution chain. C2PA.

Algorithm

‘A clearly specified mathematical process for computation; a set of rules that, if followed, will give a prescribed result ’ NIST definition.

Application programming interface (API)

‘A system access point or library function that has a well-defined syntax and is accessible from application programs or user code to provide well-defined functionality. ’ NIST definition.

Artificial general intelligence (AGI)

‘Artificial general intelligence (AGI), also known as strong AI, is the (currently hypothetical) intelligence of a machine that can accomplish any intellectual task that a human can perform. AGI is a trait attributed to future autonomous AI systems that can achieve goals in a wide range of real or virtual environments at least as effectively as humans can.’ Gartner.

Bias

‘systematic difference in treatment of certain objects, people, or groups in comparison to others’ – ISO/IEC 24027.

C2PA

The Coalition for Content Provenance and Authenticity, or C2PA, provides an open technical standard for publishers, creators and consumers to establish the origin and edits of digital content.

Classification model

‘Machine learning model whose expected output for a given input is one or more classes’ ISO/IEC 23053.

Data labelling

‘data labelling, in which datasets are labelled, which means that samples are associated with target variables.’ ISO/IEC 22989.

Dataset

‘collection of data with a shared format’ ISO/IEC 22989.

Explainability

‘property of an AI system to express important factors influencing the AI system results in a way that humans can understand’ ISO/IEC 22989.

F - P

Fairness

See Guidance 4. Fairness | digital.gov.au.

Fine-tuning

‘Model fine-tuning involves adjusting the parameters of foundation models or training models with small datasets for a specific task. This process adapts and enhances the model's performance for particular business needs’.

Generative AI (GenAI)

‘The class of AI models that emulate the structure and characteristics of input data in order to generate derived synthetic content. This can include images, videos, audio, text, and other digital content.’ NIST definition.

Grounding

Providing context or relevant knowledge to an AI model by connecting it to trusted data sources at inference time. This does not update the model itself.

Ground truth

‘Value of the target variable for a particular item of labelled input data. The term ground truth does not imply that the labelled input data consistently corresponds to the real-world value of the target variables.’ (ISO/IEC 22989).

Hallucination

‘Outputs generated by an AI system may not always be accurate or factually correct. Generative AI systems are known to hallucinate information that is not factually correct. Organisational functions that rely on the accuracy of generative AI outputs could be negatively impacted by hallucinations, unless appropriate mitigations are implemented.’ Source: Engaging with artificial intelligence | Cyber.gov.au.

Harm

“Any adverse effects that would be experienced by an individual (i.e., that may be socially, physically, or financially damaging) or an organization if the confidentiality of PII were breached.” NIST Definition.

Hyperparameters

‘characteristic of a machine learning algorithm that affects its learning process Note 1 to entry: Hyperparameters are selected prior to training and can be used in processes to help estimate model parameters.’ ISO/IEC 22989.

Infrastructure as a service (IaaS)

‘The capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, and deployed applications; and possibly limited control of select networking components (e.g., host firewalls).’ NIST definition.

Large language model (LLM)

Based on artificial neural network technology to take natural language text as input, process it and generate text as output e.g. code generation and content creation.

Machine learning

‘Process of optimizing model parameters through computational techniques, such that the model's behavior reflects the data or experience’ ISO/IEC 22989.

Model dataset

The dataset is used to train an AI model. It is made up of smaller datasets - train dataset, validation dataset and test dataset.

Model explainability

Ability of the model to provide clear and understandable reasons for model outputs to authorised humans.

Model refresh

Update or replace an existing model with a new model.

Offline training

‘The system is trained during the development process before the system is put into production. This is similar in nature to standard software development, where the system is built and tested fully before it is put into production.’ ISO/IEC 22989:2023.

Online training

‘Online learning / continuous learning – involve the incremental update of the model in the system as it operates during production. The data input to the system during operation is not only analysed to produce an output from the system, but also simultaneously used to adjust the model in the system, with the aim of improving the model on the basis of the production data. Depending on the design of the continuous learning AI system, there can be human actions required in the process, for example data labelling, validating the application of a specific incremental update or monitoring the AI system performance.’ ISO/IEC 22989:2023.

Platform as a Service (PaaS)

‘The capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages, libraries, services, and tools supported by the provider. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over the deployed applications and possibly configuration settings for the application-hosting environment.’ NIST definition.

Poisoning

‘Adversarial attacks in which an adversary interferes with a model during its training stage, such as by inserting malicious training data (data poisoning) or modifying the training process itself (model poisoning).’ NIST - Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations.

Prompt engineering

‘Prompt engineering is the discipline of providing inputs, in the form of text or images, to generative AI models to specify and confine the set of responses the model can produce. The inputs prompt a set that produces a desired outcome without updating the actual weights of the model (as done with fine-tuning).’ Gartner.

Pre-trained model

‘a component of the training stage in which a model learns general patterns, features, and relationships from vast amounts of unlabeled data, such as through self-supervised learning. Pre-training can equip models with knowledge of general features or patterns which may be useful in downstream tasks, and can be followed with additional training or fine-tuning that specializes the model for a specific downstream task.’ Source pre‐training - Glossary | NIST CSRC.

Q - Z

Regression

‘machine learning model whose expected output for a given input is a continuous variable’ ISO/IEC 23053.

Reliability

‘AI systems reliably operate in accordance with their intended purpose throughout their lifecycle’ – AI Ethics principle.

Retrieval Augmented Generation (RAG)

‘RAG enhances LLMs by retrieving relevant information from an external knowledge base and incorporating it into the LLM's generation process”.

Safety

‘Expectation that a system does not, under defined conditions, lead to a state in which human life, health, property, or the environment is endangered.’ ISO/IEC/IEEE 12207.

Semantic versioning

‘Version numbers and the way they change convey meaning about the underlying code and what has been modified from one version to the next.’ Source - semantic versioning standard https://semver.org/.

Software as a Service (SaaS)

‘Software as a service (SaaS) is software that is owned, delivered and managed remotely by one or more providers. The provider delivers software based on one set of common code and data definitions that is consumed in a one-to-many model by all contracted customers at anytime on a pay-for-use basis or as a subscription based on use metrics.’ Gartner.

Test dataset

‘data used to assess the performance of a final model’ ISO/IEC 22989.

Train dataset

‘data used to train a machine learning model’ ISO/IEC 22989.

Transparency

‘property of a system that appropriate information about the system is made available to relevant stakeholders’ ISO/IEC 22989.

Validation

‘confirmation, through the provision of objective evidence, that the requirements for a specific intended use or application have been fulfilled’ ISO/IEC 22989.

Validation dataset

‘data used to compare the performance of different candidate models’ ISO/IEC 22989.

Verification

‘confirmation, through the provision of objective evidence, that specified requirements have been fulfilled’ ISO/IEC 22989.

WCAG (Web Content Accessibility Guidelines)

WCAG explains how to make web content more accessible to people with disabilities. Web ‘content’ generally refers to the information in a web page including natural information such as text, images, and sounds, or code or markup that defines structure or presentation.