Use of Copilot was moderate. However most trial participants across classifications and job families were optimistic about Copilot and wished to keep using it.
Trial participants estimated time savings of up to an hour when summarising information, preparing a first draft of a document and searching for information.
The highest efficiency gains were perceived by APS levels 3-6, Executive Level (EL) 1 staff and ICT roles.
The majority of managers (64%) perceived uplifts in efficiency and quality in their teams.
40% of trial participants reported their ability to reallocate their time to higher-value activities such as staff engagement and strategic planning.
There is potential for Copilot to improve inclusivity and accessibility in the workplace and in government communication.
There are key integration, data security and information management considerations agencies must consider prior to Copilot adoption, including scalability and performance of the GPT integration and understanding of the context of the large language model.
Training in prompt engineering and use cases tailored to agency needs is required to build capability and confidence in Copilot.
Clear communication and policies are required to address uncertainty regarding the security of Copilot, accountabilities and expectation of use.
Adaptive planning is needed to reflect the rolling feature release cycle of Copilot alongside governance structures that reflect agencies’ risk appetite, and clear roles and responsibilities across government to provide advice on generative AI use. Given its infancy, agencies would need to consider the costs of implementing Copilot in its current version. More broadly this should be a consideration for other generative AI tools.
There are broader concerns on the potential impact of generative AI on APS jobs and skills, particularly on entry-level jobs and women.
Large language model (LLM) outputs may be biased towards western norms and may not appropriately use cultural data and information.
There are broader concerns regarding vendor lock-in and competition, as well as the use of generative AI on the APS’ environmental footprint.
Agencies should consider which generative AI solution are most appropriate for their overall operating environment and specific use cases, particularly for AI Assistant Tools.
Agencies must configure their information systems, permissions, and processes to safely accommodate generative AI products.
Agencies should offer specialised training reflecting agency-specific use cases and develop general generative AI capabilities, including prompt training.
Effective change management should support the integration of generative AI by identifying ‘Generative AI Champions’ to highlight the benefits and encourage adoption.
The APS must provide clear guidance on using generative AI, including when consent and disclaimers are needed, such as in meeting recordings, and a clear articulation of accountabilities.
Agencies should conduct detailed analyses of workflows across various job families and classifications to identify further use cases that could improve generative AI adoption.
Agencies should share use cases in appropriate whole-of-government forums to facilitate the adoption of generative AI across the APS.
The APS should proactively monitor the impacts of generative AI, including its effects on the workforce, to manage current and emerging risks effectively.
The Digital Transformation Agency (DTA) designed 4 evaluation objectives, in consultation with:
Evaluate APS staff sentiment about the use of Copilot, including:
Determine if Copilot, as an example of generative AI, benefits APS productivity in terms of:
Determine whether and to what extent Copilot, as an example of generative AI:
Identify and understand unintended benefits, consequences, or challenges of implementing Copilot as an example of generative AI and the implications on adoption of generative AI in the APS.
Generative AI could improve inclusivity and accessibility in the workplace particularly for people who are neurodiverse, with disability or from a culturally and linguistically diverse background.
The adoption of Copilot and generative AI more broadly in the APS could help the APS attract and retain employees.
There are concerns regarding the potential impact of generative AI on APS jobs and skills needs in the future. This is particularly true for administrative roles, which then have a disproportionate flow on impact to marginalised groups, entry-level positions and women who tend to have greater representation in these roles as pathways into the APS.
Copilot outputs may be biased towards western norms and may not appropriately use cultural data and information such as misusing First Nations images and misspelling First Nations words.
The use of generative AI might lead to a loss of skill in summarisation and writing. Conversely a lack of adoption of generative AI may result in a false assumption that people who use it may be more productive than those that do not.
Participants expressed concerns relating to vendor lock-in, however the realised benefits were limited to specific features and use cases.
Participants were also concerned with the APS’ increased impact on the environment resulting from generative AI use.
To ensure breadth and depth of insight through the evaluation, a mixed-methods approach was used. Qualitative and quantitative data collection methods were leveraged, including:
A desktop review of reports provided by agencies and other documents relevant to the trial was also undertaken. The evaluation engaged with over 50 agencies and more than 2,000 trial participants between January to July 2024 across various engagement streams.
Information was gathered using several methods of evaluation.
The evaluation synthesised existing evidence, including:
It also involved thematic analysis through:
Analysis was conducted on data collected from:
A thematic, frequency and comparative analysis of both qualitative and quantitative data was undertaken. Evaluation objectives and KLEs shaped the thematic analysis completed on qualitative data. In addition to this, frequency analysis provided insight into the majority sentiment of participants. Where possible, a comparative analysis was undertaken on survey responses. A total of 330 responses from the pre-use and post-use survey were linked via a unique survey ID.
Off