| Agency | Report |
|---|---|
| Australian Tax Office (ATO) | Microsoft 365 Copilot trial Update |
| Commonwealth Scientific and Industrial Research Organisation (CSIRO) | Copilot for Microsoft 365; Data and Insights |
| Department of Home Affairs (Home Affairs) | Copilot Hackathon |
| Department of Industry, Science and Resources (DISR) | DISR Internal Mid-trial Survey Insights |
The uptake of publicly available generative artificial intelligence (AI) tools, like ChatGPT, has grown. In the few years since its public introduction, generative artificial intelligence has become available and accessible to millions.
This meant the Australian Public Service (APS) had to respond quickly to allow its workforce to experiment with generative AI in a safe, responsible and integrated way. To make this experimentation possible, an appropriate generative AI tool needed to be selected.
This decision was dependent on:
One solution to enable the APS to experiment with safe and responsible generative AI was Microsoft 365 Copilot (formerly Copilot for Microsoft 365). On 16 November 2023, the Australian Government announced a 6-month whole-of-government trial of Copilot. Copilot is a supplementary product that integrates with the existing applications in the Microsoft 365 suite and it’s nested within existing whole-of-government contracting arrangements with Microsoft. This made it a rapid and familiar solution to deploy.
Broadly, the trial and evaluation tested the extent the wider promise of generative AI capabilities would translate into real-world adoption by workers. The results will help the Australian Government consider future opportunities and challenges related to the adoption of generative AI.
This was the first trial of a generative AI tool in the Australian Government. The future brings exciting opportunities to understand what other tools are available to explore a broad landscape of use cases.
Moderate usage was consistent across classifications and job families but specific use cases varied. For example, a higher proportion of SES and Executive Level (EL) 2 staff used meeting summarisation features, compared to other APS classifications.
Microsoft Teams and Word were used most frequently and met participants’ needs. Poor Excel functionality and access issues in Outlook hampered use.
Content summarisation and re-writing were the most used Copilot functions.
Other generative AI tools may be more effective at meeting users’ needs in reviewing or writing code, generating images or searching research databases.
Training significantly enhanced confidence in Copilot use and was most effective when it was tailored to an agency’s context.
Identifying specific use cases for Copilot could lead to greater use of Copilot.
Improvements in efficiency and quality were perceived to occur in a few tasks with perceived time savings of around an hour a day for these tasks. These tasks include:
Copilot had a negligible impact on certain activities such as communication.
APS 3-6 and EL1 classifications and ICT-related roles experienced the highest time savings of around an hour a day on summarisation, preparing a first draft of a document and information searches.
Around 65% of managers observed an uplift in productivity across their team.
Around 40% of trial participants were able to reallocate their time to higher value activities.
Quality gains were more subdued relative to efficiency gains.
Up to 7% of trial participants reported Copilot added time to activities.
Copilot’s potential unpredictability and lack of contextual knowledge required time spent on output verification and editing which negated some of the efficiency savings.
61% of managers in the pulse survey could not confidently identify Copilot outputs.
There is a need for agencies to engage in adaptive planning while ensuring governance structures and processes appropriately reflect their risk appetites.
There were integration challenges with non-Microsoft 365 applications, particularly JAWS and Janusseal, however it should be noted that such integrations were out of scope for the trial. Note: JAWS is a software product designed to improve the accessibility of written documents. Jannusseal is a data classification tool used to easily distinguish between sensitive and non-sensitive information.
Copilot may magnify poor data security and information management practices.
Prompt engineering, identifying relevant use cases and understanding the information requirements of Copilot across Microsoft Office products were significant capability barriers.
Uncertainty regarding the need to disclose Copilot use, accountability for outputs and lack of clarity regarding the remit of Freedom of Information were barriers to Copilot use – particularly in regard to transcriptions.
Negative stigmas and ethical concerns associated with generative AI adversely impacted its adoption.
Adaptive planning is needed to reflect the rolling release cycle nature of generative AI tools, alongside relevant governance structures aligned to agencies’ risk appetites.
A mixed-methods approach was adopted for the evaluation.
Over 2,000 trial participants from more than 50 agencies contributed to the evaluation. The final report was written based on document/data review, consultations and surveys.
The evaluation synthesised existing evidence, including:
It also involved thematic analysis through:
Analysis was conducted on data collected from:
A mixed-methods approach was adopted for the evaluation.
Over 2,000 trial participants from more than 50 agencies contributed to the evaluation. The final report was written based on document/data review, consultations and surveys.
The evaluation synthesised existing evidence, including:
It also involved thematic analysis through:
Analysis was conducted on data collected from:
Several agencies conducted their own internal evaluations over the course of the trial and did not participate in Digital Transformation Agency’s overall evaluation.
Mitigations: where possible, the evaluation has drawn on agency-specific evaluation to complement findings.
Participants self-nominated to be involved in the trial, contributing to a degree of selection bias. The representation of APS job families and classifications in the trial differs from the proportions in the overall APS.
Mitigations: the over and underrepresentation of certain groups has been noted. Statistical significance and standard error were calculated, where applicable, to ensure robustness of results.
Agencies began the trial at different stages, meaning there was not an equal opportunity to build capability or identify use cases. Agencies also used different versions of Copilot due to frequent product releases.
Mitigations: there is a distinction between what may be a functionality limitation of Copilot and when a feature has been disabled by an agency.
Trial participants were asked to estimate the scale of Copilot’s benefits, which may naturally under or overestimate its impact.
Mitigations: where possible, the evaluation has compared productivity findings against other evaluations and external research to verify its validity.
The trial of Copilot for Microsoft 365 involved the distribution of nearly 5,765 Copilot licenses across 56 participating agencies. As part of engagement activities — consultations and surveys — the evaluation gathered the experience and sentiment from over 2,000 trial participants representing more than 45 agencies. Insights were further strengthened by the findings from internal evaluations completed by certain agencies. The sample size was sufficient to ensure 95% confidence intervals of reported proportions (at the overall level) were within a margin of error of 5%.
There were 3 questions asked in the post-use survey that were originally included in either the pre-use or pulse survey. These questions were repeated to compare responses of trial participants before and after the survey and measure the change in sentiment. A t-test was used to determine whether changes were statistically significant at a 5% level of significance.
The survey aligned with the APS Job Family Framework and APS job families and classifications were aggregated in survey analysis to reduce standard error and ensure statistical robustness. Post-use survey responses from Trades and Labour, and Monitoring and Audit job families were excluded from reporting as their sample size was less than 10, but their responses were still included in aggregate findings.
For APS classifications, APS 3-6 have been aggregated.
| Group | Job families |
|---|---|
| Corporate | Accounting and Finance Administration Communications and Marketing Human Resources Information and Knowledge Management Legal and Parliamentary |
| ICT and Digital Solutions | ICT and Digital Solutions |
| Policy and Program Management | Policy Portfolio, Program and Project Management Service Delivery |
| Technical | Compliance and Regulation Data and Research Engineering and Technical Intelligence Science and Health |
| Percentage of all APS employees | Percentage of pre-use survey respondents | Percentage of post-use survey respondents | |
|---|---|---|---|
| SES | 1.9 | 4.7 | 5.3 |
| EL 2 | 9.0 | 20.0 | 20.2 |
| EL 1 | 20.8 | 36.9 | 34.0 |
| APS 6 | 23.4 | 23.4 | 22.3 |
| APS 5 | 14.7 | 8.5 | 9.6 |
| APS 3-4 | 26.0 | 6.0 | 7.4 |
| APS 1-2 | 4.2 | 10.5 | 1.1 |
| Percentage of all APS employees | Percentage of pre-use survey respondents | Percentage of post-use survey respondents | |
|---|---|---|---|
| Accounting and Finance | 5.1 | 5.3 | 3.5 |
| Administration | 11.4 | 9.0 | 8.9 |
| Communication and Marketing | 2.5 | 4.9 | 5.8 |
| Compliance and Regulation | 10.3 | 6.6 | 6.5 |
| Data and Research | 3.7 | 9.9 | 8.3 |
| Engineering and Technical | 1.8 | 1.3 | 1.5 |
| Human Resources | 3.9 | 5.3 | 5.0 |
| ICT and Digital Solutions | 5.0 | 19.6 | 22.3 |
| Information and Knowledge Management | 1.1 | 2.5 | 1.6 |
| Intelligence | 2.4 | 0.9 | 2.1 |
| Legal and Parliamentary | 2.6 | 4.1 | 3.5 |
| Monitoring and Audit | 1.5 | 1.1 | 1.0 |
| Policy | 7.9 | 13.7 | 14.4 |
| Portfolio, Program and Project Management | 8.3 | 8.6 | 7.5 |
| Science and Health | 4.2 | 1.6 | 2.1 |
| Senior Executive | 2.1 | 2.3 | 1.5 |
| Service Delivery | 25.5 | 2.7 | 4.0 |
| Trades and Labour | 0.7 | 0.9 | - |
| Portfolio | Entity |
|---|---|
| Agriculture, Fisheries and Forestry | Department of Agriculture, Fisheries and Forestry Grains Research and Development Corporation Regional Investment Corporation Rural Industries Research and Development (trading as AgriFutures Australia) |
| Attorney-General’s | Australian Criminal Intelligence Commission Australian Federal Police Australian Financial Security Authority Office of the Commonwealth Ombudsman |
| Climate Change, Energy, the Environment and Water | Australian Institute of Marine Science Australian Renewable Energy Agency Department of Climate Change, Energy, Environment and Water Bureau of Meteorology |
| Education | Australian Research Council Department of Education Tertiary Education Quality and Standards Agency |
| Employment and Workplace Relations | Comcare Department of Employment and Workplace Relations Fair Work Commission |
| Finance | Commonwealth Superannuation Corporation Department of Finance Digital Transformation Agency |
| Foreign and Trade Affairs | Australian Centre for International Agricultural Research Australian Trade and Investment Commission Department of Foreign Affairs and Trade Tourism Australia |
| Health and Aged Care | Australian Digital Health Agency Australian Institute of Health and Welfare Department of Health and Aged Care |
| Home Affairs | Department of Home Affairs (Immigration and Border Protection) |
| Industry, Science and Resources | Australian Building Codes Board Australian Nuclear Science and Technology Organisation Commonwealth Scientific and Industrial Research Organisation Department of Industry, Science and Resources Geoscience Australia IP Australia |
| Infrastructure, Transport, Regional Development, Communication and the Arts | Australian Transport Safety Bureau |
| Parliamentary Departments (not a portfolio) | Department of Parliamentary Services |
| Social Services | Australian Institute of Family Studies National Disability Insurance Agency |
| Treasury | Australian Prudential Regulation Authority Australian Securities and Investments Commission Australian Charities and Not-for-profits Commission Australian Taxation Office Department of the Treasury Productivity Commission |