Statement 30: Test for intended and unintended consequences
Agencies must
- Criterion 108: Perform user acceptance testing (UAT) and scenario testing, validating the system with a diversity of end-users in their operating contexts and real-world scenarios. 
Agencies should
- Criterion 109: Perform robust regression testing to mitigate the heightened risk of escaped defects resulting from changes, such as a step change in parameters. - Traditional software regression testing is insufficient. - This may include: - back-to-back testing to compare two versions of system or software using historical data
- A/B software testing to simultaneously compare multiple versions in a real-world setting. This allows agencies to assess the impact of a specific model or software package on the overall system in its intended operating environment.
- performance regression, checking for any degradation in model accuracy, fairness, or other key metrics.
 
 
              
  