Statement 30: Test for intended and unintended consequences
Agencies must
- Criterion 108: Perform user acceptance testing (UAT) and scenario testing, validating the system with a diversity of end-users in their operating contexts and real-world scenarios.
Agencies should
Criterion 109: Perform robust regression testing to mitigate the heightened risk of escaped defects resulting from changes, such as a step change in parameters.
Traditional software regression testing is insufficient.
This may include:
- back-to-back testing to compare two versions of system or software using historical data
- A/B software testing to simultaneously compare multiple versions in a real-world setting. This allows agencies to assess the impact of a specific model or software package on the overall system in its intended operating environment.
- performance regression, checking for any degradation in model accuracy, fairness, or other key metrics.