Statement 27: Test for specified behaviour

Agencies must

  • Criterion 95: Undertake human verification of test design and implementation for correctness, consistency, and completeness. 
  • Criterion 96: Conduct functional performance testing to verify the correctness of the SUT as per the pre-defined metrics. 

    This includes testing for fairness and bias to inform affirmative actions.

    For off-the-shelf systems, consider benchmark testing using industry-standard benchmark suites and comparisons with competing AI systems.

  • Criterion 97: Perform controllability testing to verify human oversight and control, and system control requirements.
  • Criterion 98: Perform explainability and transparency testing as per the requirements.

    This involves:

    • testing that AI outputs are understandable for the target audience, ensuring diversity of test subjects and representativeness of the target population
    • testing that the right information is available for the right user.
  • Criterion 99: Perform calibration testing as per the requirements.

    This involves:

    • measuring functional performance across various operating or installation conditions
    • testing that changes in calibration parameters are detected
    • testing that any out-of-range calibration parameters are rejected by the AI system in a transparent and explainable way.
  • Criterion 100: Perform logging tests as per the requirements.

    This involves verifying that the system records:

    • system warnings and errors
    • relevant system changes with corresponding details of who made the change, timestamp, and system version.

Statement 28: Test for safety, robustness, and reliability

Connect with the digital community

Share, build or learn digital experience and skills with training and events, and collaborate with peers across government.