Statement 23: Validate, assess, and update model
Agencies must
Criterion 84: Set techniques to validate AI trained models.
There are multiple qualitative and quantitative techniques and tools for model validation, informed by the AI system success criteria (see Design section), including:
- correct classifications, predictions or forecasts, and factual correctness and relevance
- identify between positive and negative instances, and distinguish between classes
- benchmarking
- consistency in responses, clarity and coherence
- source attribution
- data-centric validation approaches for GenAI models.
Criterion 85: Evaluate the model against training boundaries.
Evaluation considerations include:
- poor or degraded performance of the model
- change of AI context or operational setting
- data retention policies
- model retention policies.
Criterion 86: Evaluate the model for bias, implement and test bias mitigations.
This includes:
- using suitable tools that test and discover unwarranted associations between an algorithm’s protected input features and its output
- evaluating performance across suitable and intersectional dimensions
- checking if bias could be managed through updating the training data (see Statement 18 )
- implementing bias mitigation thresholds that can be configured post-deployment
- implementing pre-processing or post-processing techniques such as disparate impact remover, equalised odds post-processing, content filtering, and RAG.
Agencies should
Criterion 87: Identify relevant model refinement methods.
These considerations may trigger model refinement or retirement and can include:
- model parameter or weight adjustments – further training or re-training the model on a new set of observations, or additional training data
- adjusting data pre-processing or post-processing components
- model pruning – to reduce redundant mathematical calculations and speed up operations.