Context, data and rationale

Pilot AI assurance framework background

The pilot framework’s AI impact assessment process and supporting guidance were developed by the AI in Government Taskforce, which operated from September 2023 to June 2024. The taskforce was co-led by the DTA and the Department of Industry, Science and Resources (DISR) and staffed by secondees from 12 Australian Public Service (APS) agencies. The drafting process involved several rounds of consultation, with interested agencies providing feedback that informed further refinements. After the taskforce concluded, the DTA resumed responsibility for the AI in government agenda.  

This included implementing the Policy for the responsible use of AI in government in September 2024, which introduced mandatory agency level AI accountability and transparency requirements. The policy also recommends agencies make sure staff have the skills to responsibly engage with AI. The DTA has also developed an online training module on AI in government fundamentals to support this.

The AI impact assessment process tested in this pilot is designed to assess an AI use case’s alignment with Australia's AI Ethics Principles. For instance, the assessment asks officials to explain how they will:

  • define and measure fairness
  • uphold privacy and human rights obligations
  • incorporate diversity
  • ensure transparency, explainability and contestability
  • designate accountability
  • demonstrate reliability and safety, including through data governance, testing, monitoring and human intervention mechanisms.

Pilot objectives

The below summary outlines how the pilot addressed its 5 objectives.

1. To test whether the framework meets its intent of placing the government as an exemplar in the responsible use of AI by assisting agencies to identify risk and apply appropriate mitigations to specific use cases.

More than half of the survey responses rated the framework assessment process useful for ensuring responsible use of AI and the same proportion reported the pilot framework helped them manage and mitigate risks that existing agency processes would not have.

However, participants also found the risk assessment process in the assessment challenging and provided feedback on other areas for improvement. This report's recommendations to update the assessment, including the risk assessment process, will further address this pilot objective.

2. To stress test, gather feedback and refine the framework (e.g. considering clarity of language, time taken, ease of use).

Around 70% of survey responses reported the assessment questions were clear and easy to understand. At the same time, the pilot's stress testing revealed a number of areas for improvement, including the risk assessment and legal review sections.

3. To identify any gaps that need to be addressed.

Key gaps participants raised related to the risk assessment and legal review sections. These and other areas for improvement identified in the pilot data will be addressed through the proposed updates outlined in the Recommendations section and below under Key feedback themes and proposed responses.

4. To gather evidence to support implementation considerations such as:

  • making the framework mandatory
  • treatment of existing use cases
  • the need for a central oversight mechanism for high-risk use cases, and the potential cost/resourcing implications of establishing such a mechanism.

Pilot participants noted that setting clear and consistent AI governance requirements would enable greater AI adoption and help build public trust. Recommendation 1 outlines a proposed approach to mandating AI use case impact assessment, with some flexibility to support implementation in diverse agency contexts. 

While some participants raised concerns that requiring assessment of all existing use cases may be burdensome, the DTA considers that applying minimum AI use case governance requirements to all AI use is desirable. See Key feedback themes and proposed responses below for further details on plans to address this. 

The pilot did not provide strong evidence to support a central oversight mechanism for high-risk use cases, with only 2 reported high-risk assessments. However, should the volume of higher-risk AI use cases increase in future, as agencies build AI confidence and capability, revisiting this question may be warranted.

5. To raise awareness of the framework and the Policy for the responsible use of AI in government across the APS.

While it is difficult to measure the extent to which the pilot raised awareness of the assessment and policy, a number of agencies contacted the DTA after the pilot commenced asking to join, with 4 agencies ultimately joining the pilot in October. Participants indicated they and their executive were eager to adopt recognised methods to demonstrate their current or planned AI use upheld the government's commitment to exemplary safe and responsible AI adoption.

Pilot agencies

The 21 pilot agencies, listed at Appendix A, included a range of agency types, including very large operational agencies, medium and large policy agencies and smaller, specialised entities. Some pilot agencies had not yet implemented any AI tools and were only in the early exploration and experimentation stages of adoption, while others had longstanding AI use cases in production.  

These diverse agency perspectives have provided valuable insights into how the AI impact assessment can be applied in different operational contexts. These insights have informed options to further streamline, clarify and strengthen the assessment to support broad, consistent implementation. 

Methodology

  • 21 pilot agencies
  • 16 agencies submitted surveys

The pilot gathered qualitative feedback on the AI impact assessment process through midpoint interviews in October 2024 and post-pilot interviews held from late November 2024 to January 2025. Participants were also asked to complete a feedback survey following each use case assessment to provide quantitative and qualitative data (survey questions provided at Appendix B). Some pilot agencies also shared their use case assessment documentation.  

The pilot was concerned with the process of completing the AI impact assessment, rather than the content of the assessments. The pilot sought feedback on the practicality and useability of the draft impact assessment and supporting guidance, including any aspects that were more challenging, that could be improved or that were missing.  

Table 1: Reported assessments
Assessment categoryNumber
Total number of reported assessments conducted during pilot43
  • Threshold assessments (sections 1-3)
22
  • Extended assessments (some or all of sections 4-11)
14
  • Unknown (agency did not advise assessment type)
7

Participants  reported they assessed 43 use cases in total through the draft AI impact assessment process, comprising:

  • 22 low-risk, lower-complexity use cases that concluded at the initial threshold assessment (sections 1-3) and did not require an extended assessment
  • 14 use cases that undertook extended assessment beyond section 3. This report refers to these as ‘extended assessments’ as, technically, none completed the ‘full assessment’ process
  • 7 use cases for which agencies did not specify the type of assessment (threshold assessment or extended assessment)
  • 2 use cases with a high-risk rating. 

None of the 14 extended assessments completed every section of the assessment process during the pilot period. None of the extended assessments completed the legal review section (section 11.1) and only a handful underwent the formal internal governance review (section 11.3). 
 

Table 2: Survey submissions
Survey categoryNumber
Total number of post-assessment surveys submitted23
  • Threshold assessment surveys
13
  • Extended assessment surveys
10

The pilot assessment tool only required full assessment for use cases rated medium or high-risk at the section 3 threshold assessment. Some participants chose to conduct an extended assessment for use cases rated low-risk as an exercise, so the 14 extended assessments conducted as part of the pilot include a mix of low, medium and high-risk use cases. 

Table 3: Distribution of reported assessments
Number of assessments reportedNumber of agencies
01
110
25
33
51
81
Total21

The DTA asked pilot participants to submit a separate survey for each use case assessment to capture any differences in the assessment process for different types of AI use cases. In total, 16 pilot agencies submitted 23 post-assessment surveys.  

  • Some agencies provided separate surveys for each assessment, as requested.  
  • Some agencies provided a single survey response with consolidated feedback covering multiple similar assessments, rather than submitting multiple surveys with similar answers.  
  • All respondents completed the same survey. However, it is important to note that the survey responses from participants who only completed the threshold assessment (sections 1-3) do not encompass the entire assessment process required for use cases with elevated risk.
  • Some agencies reported the number of assessments but did not submit a survey.

Some agencies also submitted their draft assessment documentation, which provided valuable insights into how the pilot impact assessment was applied in different contexts and for different use case types. Sharing the assessment documentation was not a requirement of the pilot.  

Limitations

A number of limiting factors make drawing strong conclusion from pilot data challenging, including the:

  • relatively short pilot period
  • small pool of participants  
  • low numbers of completed extended assessments  
  • incomplete survey response set (5 of 21 agencies did not submit a survey)
  • divergent feedback among participants.

Participants with low-risk use cases, which only required an initial threshold assessment (sections 1-3) were able to complete all questions, including securing executive endorsement. This yielded useful feedback on different experiences with the executive endorsement process and suggestions to improve it.  

However, none of the use cases that proceeded to extended assessment were able to complete all the assessment steps in the pilot timeframe. None of these extended assessments conducted during the pilot secured legal review or internal governance review body approval (section 11). This means that feedback on these aspects of the assessment is primarily based on desktop review rather than practical application.  

Another aspect of the assessment process that was not tested was the requirement to reassess use cases in response to material changes, such as transition between AI lifecycle stages or major change in scope, usage or operation. Due to the limited pilot period, none of the use cases assessed during the pilot underwent changes that would trigger reassessment.  

Possible reasons for the low number of use cases, limited number of higher-risk or complex use cases, and lack of completed extended assessments include:

  • the short pilot period, initially planned for 2 months and extended to 3 months. AI use case development and approval processes often span many more months
  • limited agency resources for non-mandatory pilot exercise, overtaken by higherpriority tasks. It’s possible that some pilot agencies had other AI use cases they could have used to test the impact assessment process but were unable to do so due to resource or other constraints.
  • low levels of familiarity with and understanding of AI-specific risks, meaning use cases that should have been identified as posing elevated risk were identified as lowrisk and were therefore not put through the extended assessment
  • few pilot agencies with actual AI use cases in production.

Most of the participants reported use cases in the early exploratory stages, with a focus on lower-risk, less complex use cases that do not use or produce sensitive data. This suggests agencies remain cautious about AI and are taking a measured approach to its adoption. Starting with simpler use cases may help gradually build AI confidence and capability and secure leadership support. There may be other agencies exploring or already deploying more complex AI use cases however they did not participate in the pilot.  

Key feedback themes and proposed responses

Key themes uncovered from pilot participant interviews and survey responses are summarised below, with proposed actions to respond to these.

Survey data

Nearly 60% of the 23 survey responses rated the impact assessment process useful for ensuring responsible use of AI, while 35% gave the process a neutral rating and only 2 responses found it not useful (Chart 1). Nearly 70% of survey responses reported the assessment questions were clear or very clear and easy to understand (Chart 2). 

Chart 1: How useful did you find this assessment process for ensuring responsible use of AI? (n=23)

Chart 1: Bar chart showing most survey responses (57%) rated the assessment process as 4 out of 5 for usefulness; 35% rated it 3 out of 5; and very few rated it lower. None rated it ‘very useful’ (5 out of 5).

Chart 2: How clear and easy to understand were the questions in the framework? (n=23)

Chart 2: Bar chart showing 13% of survey responses rated the questions as ‘very clear’ (5 out of 5), 57% rated them 4 out of 5 for clarity, 22% rated them 3 out of 5, and 9% rated them 2 out of 5. No one rated them as ‘very unclear’ (1 out of 5).

Nearly 90% considered the guidance helpful or very helpful for completing the assessment. None of the responses rated the guidance unhelpful (Chart 3). Just over half of the surveys reported their assessment involved 1 to 4 staff (Chart 4). These were mostly lower-risk, less complex use cases that only required a threshold assessment.

Chart 3: How helpful was the guidance for completing the framework? (n=23)

Chart 3: Bar chart showing 22% of survey responses found the guidance very helpful (5 out of 5); 65% rated it 4 out of 5; 13% rated it 3 out of 5; and no one rated it as unhelpful (1 or 2 out of 5).

Chart 4: Approximately how many people were involved in completing this assessment? (n=23)

Chart 4: Bar chart showing just over half of assessments (52%) involved 1 to 4 people, 26% involved 5 to 9, 17% involved 10 to 19, and 4% involved 20 or more.

One survey response listed over 20 internal agency officials that contributed to the use case assessment, including privacy, legal, fraud, cyber and several specialist ICT teams, as well as senior executives and the third-party software provider.  

This same response reported the assessment took over 20 working days to complete for this particular use case. However, this was an outlier, as nearly 90% of surveys reported completing the framework document took up to 5 working days (Chart 5).  

Half of the surveys did not clearly specify how long the overall assessment process took. The survey design, with a single free-text field for both responses, may have contributed to this ambiguity. In some cases, it appears that ‘completing the framework document’ also completed the ‘overall assessment process’, especially if this was only an initial threshold assessment for a low-risk use case, and not an extended assessment. These responses reported finding the initial threshold assessment process straightforward, taking less than 5 working days to complete end-to-end.  

Chart 5: How long do you estimate it took to complete a) the framework document and b) the overall assessment process? (n=23)

Chart 5: Grouped bar chart showing 17% completed the framework document in less than a day and 70% took 1–5 days. For the overall assessment process, responses were more varied: 22% took 11–20 days, 13% were ongoing, and 48% did not state a duration.

Just under two-thirds of surveys reported the draft impact assessment process helped identify and assess risks that existing processes would not have captured (Chart 6).   

Chart 6: Did the Framework help you identify and assess any risks that existing processes would not have captured? (n=23)

Chart 6: Bar chart showing 65% of survey responses said the framework helped identify additional risks; 30% said it did not; and 1 person (4%) did not respond to the question.
Table 4: Post-assessment survey – all yes/no questions (n=23)
#QuestionYesNoN/ANot stated
Q5Did any of the delegates request further information before approving the assessment?50%23%27%0%
Q6Did you need to consult any specialist expertise to complete the assessment?55%27%18%0%
Q7Did the Framework help you identify and assess any risks that existing processes would not have captured?68%32%0%0%
Q8Did the Framework help you manage and mitigate any risks that existing processes would not have?50%41%5%5%
Q9Did completing this assessment lead to any changes in your AI project or use case?41%50%9%0%
Q10Did you encounter any usability issues with the Framework document itself?41%50%0%9%
Q16Was your agency's existing governance structure sufficient to oversee this AI use case?68%18%0%14%

 

Other considerations

In addition to the pilot feedback, the DTA will consider other relevant developments in the AI policy landscape to inform updates to the impact assessment process and ensure continued alignment. These include:

  • insights arising from DTA’s development of AI technical standards and broader updates to the AI in government policy
  • DISR’s whole-of-economy safe and responsible AI work including:
    • Voluntary AI Safety Standard (published September 2024)
    • proposals for mandatory guardrails (September 2024)
    • AI Impact Navigator (October 2024)
  • the APS Data Ethics Framework (December 2024)
  • the Attorney-General’s Department’s pending automated decision-making reforms
  • the Australian National Audit Office report on governance of AI at the Australian Taxation Office (February 2025)
  • recent parliamentary inquiry reports, including:
    • Senate Select Committee on Adopting AI (November 2024)
    • Joint Committee of Public Accounts and Audit Inquiry into the use and governance of AI systems by public sector entities - 'Proceed with Caution' (February 2025)
  • state and territory government AI policy developments
  • international developments – including national and multilateral government initiatives
  • emerging research on AI safety and assurance. 

 

Next page

Appendix A: Pilot agencies

Connect with the digital community

Share, build or learn digital experience and skills with training and events, and collaborate with peers across government.