Statement 27: Test for specified behaviour
Agencies must
- Criterion 95: Undertake human verification of test design and implementation for correctness, consistency, and completeness.
Criterion 96: Conduct functional performance testing to verify the correctness of the SUT as per the pre-defined metrics.
This includes testing for fairness and bias to inform affirmative actions.
For off-the-shelf systems, consider benchmark testing using industry-standard benchmark suites and comparisons with competing AI systems.
- Criterion 97: Perform controllability testing to verify human oversight and control, and system control requirements.
Criterion 98: Perform explainability and transparency testing as per the requirements.
This involves:
- testing that AI outputs are understandable for the target audience, ensuring diversity of test subjects and representativeness of the target population
- testing that the right information is available for the right user.
Criterion 99: Perform calibration testing as per the requirements.
This involves:
- measuring functional performance across various operating or installation conditions
- testing that changes in calibration parameters are detected
- testing that any out-of-range calibration parameters are rejected by the AI system in a transparent and explainable way.
Criterion 100: Perform logging tests as per the requirements.
This involves verifying that the system records:
- system warnings and errors
- relevant system changes with corresponding details of who made the change, timestamp, and system version.