Statement 20: Plan the model architecture
Agencies must
Criterion 71: Establish success criteria that covers any AI training and operational limitations for infrastructure and costs.
Ensure alignment with AI system metrics selected at the design stage.
Consider:
- AI system purpose and requirements including explainability
- pre-defined AI system metrics including AI performance metrics
- impact and treatment for false positives and false negatives
- AI operational environment including scalability intentions
- frequency of change in context
- limitations on compute infrastructure
- cost constraints
- operational models such as ModelOps, MLOps, LLMOps, DataOps, and DevOps (see Statement 1 ).
AI training can occur in offline mode, or in online or real-time mode. This is dependent on the business case and the maturity of the data and infrastructure architecture. The risk of the model becoming stale is higher in offline mode, while the risk of the model exhibiting unverified behaviour is higher in online mode.
The training process is interdependent to the infrastructure in the training environment. Complex model architectures with highly specialised learning strategies and large model datasets generally require tailored infrastructure to manage costs.
Criterion 72: Define a model architecture for the use case suitable to the data and AI system operation.
The following will influence the choice of the model architecture and algorithms:
- business requirements – risk thresholds or performance criteria
- purpose of the system – identified stakeholders and the intended outcomes, safety, reproducibility level of AI model outputs, or explainability level for AI outputs
- data – bias, quality, and managing the supply of data to the system
- supporting infrastructure – computational demands, costs, and speed with respect to business needs
- resourcing – the capabilities involved with documentation, oversight, and intervention in training the AI model or reusable assets
- design – the training process will include necessary human oversight and intervention, to ensure responsible AI practices are in place. Consider embedding flexible architecture practices to avoid vendor lock-in.
The model architecture will highlight the variables that will impact the intended outcomes for the system. These variables will include the model dataset, use case application and scalability intentions. These variables will influence which algorithms and learning strategies are chosen to train the AI model.
An AI scientist can test and analyse the model architecture and dataset to identify what is needed to effectively train the system. Additionally, they can outline requirements for the model architecture to comply with data, privacy, and ethical expectations.
Consider starting off with simple and small architectures, and add complexity progressively depending on the purpose of the system to simplify debugging and reduce errors. Note that an AI system can contain a combination of multiple models which can add to the complexity.
Generally, a single type of algorithm and training process may not be sufficient to determine optimal models for the AI system. It is usually good practice to train multiple models with various algorithms and training methodologies.
There are options to develop a chain of AI models, or add more complexity, if that better meets the intent of the AI system. Each model could use a different type of algorithm and training process.
Analysis of support and maintenance of the AI system in operation can influence the model architecture. For some use cases, a complete model refresh may be required, noting cost considerations. Alternatives such as updates to pre or post processing could be considered, including updates to the configuration or knowledge repository for RAG for GenAI.
It may not be necessary to retrain models every time new information becomes available, and this should be considered when defining the model architecture. For example, for GenAI, adding new information in RAG can help the AI system remain up to date without the need to retrain the AI model, saving on costs without impacting AI accuracy.
Criterion 73: Select algorithms aligned with the purpose of the AI system and the available data.
There are various forms of algorithms to train an AI model, and it is important to select them based on the AI system requirements, model success criteria, and the available model dataset. A learning strategy is a method to train an AI model and dictates the mathematical computations that will be required during the training process.
Depending on use case, some examples of the types of training processes may include:
- supervised learning – training an AI model with a dataset, made up of observations, that has desired outputs or labels, such as support vector machines or tree-based models
- unsupervised learning – training a model to learn patterns in the dataset itself, where the training dataset does not have desired outputs or labels, such as anomaly detection or transformer LLMs
- reinforcement learning – training a model to maximise pre-defined goals, such as Monte Carlo tree search or fine-tuning models
- transfer learning – a model trained on one task, such as a pre-trained model, is reused as a starting point to enhance model performance on a related, yet different, task
- parameter tuning – optimising a model’s performance by adjusting parameters or hyperparameters of a model, usually adjusted automatically
- model retraining – updating a model with new data
- online or real-time mode – continuously train the model using live data (note that this can significantly increase vulnerability of the AI system, such as data poisoning attacks).
Like traditional software, there are options to reuse, reconfigure, buy, or build models. An agency could reuse off-the-shelf models as-is, fine-tune pre-trained models, use pre-built algorithms, or create new models. The approach taken to training will vary across model types.
- Criterion 74: Set training boundaries in relation to any infrastructure, performance, and cost limitations.
Agencies should
Criterion 75: Start small, scale gradually.
Consider:
- starting off with simple and small architectures and add complexity progressively, depending on the purpose of the system to simplify debugging and reduce errors
- that an AI system can contain a combination of multiple models which can add to the complexity.