-
-
-
Managers have also noticed productivity improvements within their teams.
Approximately 65% of the managers in the post-use survey found that Copilot had a positive impact on the quality and efficiency of their team members. As shown in Figure 12, less than 3% of this cohort believed Copilot had a negative effect on their team.
Figure 12 | Post-use survey responses to 'What is the impact of Copilot on…', from respondents who manage staff (n=209) -
Data for figure 12
Post-use survey responses to 'What is the impact of Copilot on…', from respondents who manage staff (n=209).
Sentiment Negative Somewhat negative Neutral Somewhat positive Positive quality of your team's output (n=209) 0% 3% 32% 47% 17% efficiency of your staff (n=208) 0% 2% 31% 49% 17% Totals may amount to less or more than 100% due to rounding.
Off -
-
-
Manager respondents in the post-use survey indicated that Copilot helped team members to quickly produce briefing materials and added value to written deliverables. Some managers in focus groups thought that Copilot made writing more consistent across their teams and lifted the overall standard of work.
Efficiencies are concentrated in a few tasks
Copilot contributed the highest perceived time savings in tasks related to summarisation, preparing first drafts and information searches.
Post-use survey respondents perceived Copilot contributed the highest time savings in activities related to information summarisation preparing first drafts and information searches. Respondents estimated that Copilot saved up to an hour a day in these activities, shown in Table 4. These figures are approximations and likely quote the upper bound of time savings Copilot could contribute (assuming APS employees perform the tasks every day).
Table 4. Averaged post-use survey responses to 'On average, how many hours per day has Copilot helped you save in the following areas' from respondents who completed both pre- and post-use surveys (n=330) Activity Hours Communicating through digital means other than meetings 0.5 Summarising existing information 1.1 Preparing first draft of a document 1.0 Searching for information required for a task 0.8 Undertaking preliminary data analysis 0.5 Undertaking preliminary data analysis 0.6 Preparing meeting minutes 0.9 Table notes:
- Hours saved on tasks was approximated by first calculating the mean of the time brackets specified in the question (e.g. 0, 1-4, 5-8, 9-12…).
- The average time was then multiplied by the number of respondents (for each bracket) to determine total time on the activity. The total time is then divided by the number of respondents to estimate average time per respondent.
Productivity benefits were concentrated in a narrow set of tasks that are commonly undertaken by APS staff
In activities where Copilot was perceived to save a significant proportion of time – preparing meeting minutes, summarising information and preparing slides – AI assistants, in the future, could become the primary means to significantly reduce the effort to complete these tasks, but there still remains the need for human involvement and accountability.
The time savings associated with these activities were also observed in agency evaluations. The Australian Tax Office (ATO) saw the greatest proportional efficiencies in these activities (Australian Taxation Office 2024:3) and Home Affairs Copilot trial participants observed that Copilot may provide time savings in scribing, minute-taking, writing up action items and transcribing (Department of Home Affairs 2024:10). For other tasks such as ‘summarising existing information’ and ‘preparing first draft of a document,’ Copilot was perceived to reduce the time spent on these tasks by between 50-70%.
Finally, there is an interesting intersection between time saved and usage. For example, PowerPoint was not frequently used by trial participants but it saved a significant proportion of time. The ATO identified a similar insight in their evaluation as the highest absolute time savings were in data visualisation, taking nearly an hour off the activity (Australian Taxation Office 2024:3). This implies that for those who do use a broad range of MS products and Copilot functionality, the potential time savings from applications such as PowerPoint could be significant.
Copilot’s impact on efficiency varied according to job requirements.
The ICT and Digital Solution job family experienced the most efficiency gains.
Across all the activities provided in the post-use survey, the ICT and Digital Solution job family group estimated the highest efficiency savings across all activities. As shown in Table 5, the ICT and digital solutions job family reported an efficiency saving of around an hour a day when performing summarisation and document drafting activities.
Table 5. Averaged post-use survey responses to 'On average, how many hours per day has Copilot helped you save in the following areas', by APS job family Average Corporate ICT and digital solutions Policy and program management Technical Searching for information required for a task (n=718) 0.76 0.7 0.85 0.68 0.86 Summarising existing information
(n=735)1.03 1 1.06 0.99 1.08 Preparing meeting minutes
(n=608)0.94 0.82 1.06 0.91 1 Preparing first draft of a document
(n=715)0.99 0.94 1.12 0.96 0.96 Undertaking preliminary data analysis (n=586) 0.59 0.67 0.69 0.57 0.43 Preparing slides
(n=605)0.59 0.55 0.64 0.55 0.63 Communicating through digital means other than meetings (n=680) 0.49 0.45 0.54 0.51 0.46 Attending meetings
(n=713)0.37 0.33 0.48 0.41 0.26 Writing or reviewing code in a programming language (n=393) 0.5 0.48 0.58 0.3 0.6 Table 6. Averaged post-use survey responses to 'On average, how many hours per day has Copilot helped you save in the following areas', by APS classification Average APS 3-6 EL 1 EL 2 SES Searching for information required for a task (n=690) 0.73 0.83 0.84 0.63 0.61 Summarising existing information for various purposes (n=708) 0.99 1.06 1.07 0.97 0.86 Preparing meeting minutes (n=582) 0.95 0.99 0.97 0.89 0.95 Preparing first draft of a document (n=687) 0.93 1.1 1.09 0.76 0.78 Undertaking preliminary data analysis (n=561) 0.57 0.64 0.67 0.45 0.52 Preparing slides (n=575) 0.61 0.66 0.6 0.51 0.68 Communicating through digital means other than meetings (n=651) 0.48 0.56 0.54 0.39 0.44 Attending meetings (n=682) 0.37 0.38 0.48 0.28 0.35 Writing or reviewing code in a programming language (n=370) 0.40 0.68 0.57 0.21 0.12 Within agencies, APS 3 to 4 (usually graduates) are usually expected to lead notetaking and summarisation tasks as well as create the first draft of document. APS staff in more junior levels may not yet possess the capability to complete these tasks efficiently. It is likely that Copilot positively augments their ability to a greater extent than more experienced employees.
Around 40% of trial participants reported the ability to reallocate their time to higher value activities.
For some trial participants, Copilot was seen as a facilitator for engagement in more substantive and complex work. As shown in Figure 13, 41% of post-use survey respondents believed Copilot enabled them to spend more time on higher-value tasks.
Figure 13 | Post-use survey responses to 'What extent do you agree with the following statement: Copilot has enabled me to allocate my time to perform tasks that are higher value and/or more complex' (n=807) -
Data for figure 13
Post-use survey responses to 'What extent do you agree with the following statement: Copilot has enabled me to allocate my time to perform tasks that are higher value and/or more complex' (n=807).
Sentiment Strongly disagree Disagree Neutral Agree Strongly agree Response 4% 10% 44% 32% 9% Totals may amount to less or more than 100% due to rounding.
Off -
-
-
Post-use survey respondents remarked they felt they spent less time playing ‘corporate archaeologist’ in searching for information and documents and more time in strategic thinking and deep analysis.
-
Data for figure 14
Post-use survey responses reporting time savings of 0.5 hours or more (n=795) and overall agreement of improved quality of work (n=801), by type of activity.
OffResponse Improved quality Some time saved Summarising existing information for various purposes 69% 76% Preparing the first draft of a document 58% 67% Preparing meeting minutes 60% 68% Searching for information required for a task 54% 62% Undertaking preliminary data analysis 32% 44% Preparing slides 35% 40% Communicating through digital means other than meetings 31% 35% Writing or reviewing code in a programming language 30% 30% -
-
-
A concern voiced by many focus group and post-use survey participants was that Copilot could not emulate the standard style of Australian Government documents. Some of these participants highlighted that heavy re-work was needed to meet the tone expected by senior stakeholders within their agency and of government more broadly.
For this reason, focus group participants noted they would not use Copilot for important documents or communications. Some trial participants acknowledged that Copilot could get closer to the desired output through follow-up prompts and clarifications, but this was not viewed as being worth the additional effort.
Copilot’s unpredictability and inaccuracy limited the scale of productivity benefits.
The unpredictability of Copilot affected the trust of trial participants and their productivity gains. Generative AI is a non-deterministic form of AI, meaning it will almost always produce a different output even if given the same exact prompt. Copilot is trained to predict patterns rather than understand facts, sometimes leading to it returning plausible sounding but inaccurate information, which is referred to as a ‘hallucination’.
Many trial participants across focus groups and the post-use survey commented that due to fears of hallucinations, they combed through Copilot’s outputs to verify its accuracy. In some cases, this involved reading the entire document Copilot produced to check for any errors which significantly reduced any efficiency gains.
As shown in Table 4, up to 7% of post-use survey respondents reported that Copilot added time to tasks in part due to the effort required to verify outputs. Distrust of Copilot’s outputs also surfaced in DISR’s internal mid-trial survey insights, with 60% of trial participants claiming they had to make a moderate to significant number of edits to outputs (see Department of Industry, Science and Resources, 2024:6).
Table 7. Post-use survey responses reporting that Copilot added time to activity, by type Response Added time Preparing slides (n=620) 7% Undertaking preliminary data analysis (n=603) 6% Writing or reviewing code in a programming language (n=620) 6% Attending meetings (n=739) 4% Summarising existing information for various purposes (n=759) 3% Preparing the first draft of a document (n=739) 3% Searching for information required for a task (n=744) 3% Communicating through digital means other than meetings (n=705) 3% -
This section outlines findings in relation to the challenges experienced by the Australian Public Service (APS) in adopting Microsoft 365 Copilot and its broader lessons that are applicable to the APS’ future adoption of broader generative AI.
-
Key insights
There are currently challenges with the integration of Copilot with products outside of the Microsoft Office suite, limiting its potential benefits to agencies that use non-Microsoft products.
Poor data security and information management processes could lead to Copilot inappropriately accessing sensitive information.
Agencies should also note that that some Copilot functionality – in particular for Outlook – requires the newest versions of Microsoft Office products.
Tailored training in prompt engineering and agency/role-specific use cases were needed to build capability in Copilot. A range of methods were used to upskill staff, including formal training sessions and informal forums. Managers also require specific training to help verify Copilot outputs.
There are cultural barriers that may be impeding the uptake of Copilot ranging from the perceived negative stigma of using generative AI to a lack of trust with generative AI products.
Further clarity on personal accountabilities for Copilot outputs alongside greater guidance on the extent consent and disclaimers are needed for generative AI use are required to improve adoption.
Given the evergreen nature of generative AI, there is a need for agencies to engage in adaptive planning while setting up appropriate governance structures and processes that reflect their risk appetites.
There are key integration, data security and information management considerations
Copilot may require plugins or other connectors to ensure seamless integration across an organisations’ technology stack.
-
… their labelling of classified or sensitive information does not work with Copilot as they use the third party Janusseal.
Agency representative in DTA interview. -
Copilot is available through applications within the Microsoft 365 ecosystem. However, to access data or applications that sit outside this ecosystem, organisations need to leverage plugins or Microsoft Graph connectors to create extensibility.
A small number of pre-use survey respondents and DTA interview participants raised an issue of the lack of Copilot integration with third-party software in particular with Janusseal, a software that enables enterprise-grade data classification for Windows users, and JAWS, a computer screen reader program that allows blind and visually impaired users to read the screen either with a text-to-speech output or with a refreshable Braille display. Issues with JAWS integration comprised 16% of total issues recorded in the issues register.
The lack of integration with Janusseal creates a potential limit to the usefulness of Copilot for APS staff regularly interacting with sensitive information in organisations where data classification is managed through third-party providers such as Janusseal. Interviews conducted by the DTA noted a lack of integration with Janusseal could lead to APS staff gaining access to information they did not have permissions for. Microsoft has advised that this is a third-party labelling issue, not a security issue, and that Copilot has an in-built fail safe to protect against this issue. It should be noted that such integrations were out of scope for the trial and Microsoft has further advised that a more permanent fix to the labelling issue is in the pipeline (Microsoft 2024).
The newest versions of Microsoft Office products were required to enable some Copilot functionality
Copilot is available in Outlook, enabling users to more efficiently manage their inboxes. Newer features of Copilot were initially released with the new version of Outlook, rather than classic Outlook.
The integration of Copilot with Microsoft Outlook was frequently raised as a key issue as Copilot features in Outlook were only available with the newest version of Microsoft Outlook or the web version of Copilot. Microsoft initially planned to release Copilot updates to the new version of Outlook and later for classic Outlook. Focus group participants often lamented not being able to access the full capabilities of Copilot as they did not have access to the new Outlook. One trial participant noted through the issues register that, ‘classic Outlook will only support the bare minimum Copilot features’.
Focus group participants also reported that the online versions of Microsoft Office apps had a poorer user experience than the desktop applications, which dissuaded them from using Outlook online. For agencies without the newest version of Microsoft Outlook, the overall potential benefits of Copilot would likely be significantly reduced and restricted to other use cases such as the summarisation and drafting use cases in Microsoft Word and Teams.
Poor information, data management practices and permissions resulted in inappropriate access and sharing of sensitive information.
Agencies classify their data and apply permissions to ensure access is limited to authorised personnel and that staff understand a document’s security levels.
-
Their information management in SharePoint is not great which has resulted in end users finding information that they shouldn’t have had access to, though this is a governance and data management issue - not a Copilot issue.
Agency representative in DTA interview. -
Use of Copilot enabled some participants to access documents that they should not have had permission to access. Trial participants raised instances where Copilot surfaced sensitive data that staff had not classified or stored appropriately. This was largely because their organisation had not properly assured the security and storage of some instances of data and information before adopting Copilot. Without the appropriate data infrastructure and governance in place, the use of Copilot may further exacerbate risks of data and security breaches in the APS.
Tailored training in prompt engineering and use cases is needed to build capability and confidence
Prompt engineering and understanding the information requirements of Copilot across Microsoft Office products were significant capability barriers for trial participants.
There are 2 key skills required for users to realise the benefits of Copilot: writing effective prompts and understanding the different information structures that Copilot needs across different Microsoft products.
-
[Writing a good prompt is] a new concept … people did not know how to do that
Trial participant with a CIO role, focus group. -
Prompt engineering was viewed as one of the highest learning curves among focus group participants. Most focus group participants mentioned that prompt engineering is not a widely held skill in the APS and that it takes time, training and consistent experimentation to develop it. They recognised that tailoring prompts to specify the style, tone or format of outputs greatly enhanced the effectiveness of the tool. Without this capability, Copilot would be more likely to return generic, contextually unaware responses that were ill-suited to the user’s needs.
Another barrier to prompt engineering capability uplift may also be the consistency of performance attributed to the limitations in the large language model that was used throughout the trial. Until May 2024, a bespoke version of ChatGPT-3.5 was the model supporting Copilot, which was then updated to ChatGPT-4 Turbo. This update significantly increased the available characters or ‘input length’ that users could use to prompt, as well as an increase in the output length of Copilot’s response. This allows Copilot to consume more information and potentially provide more accurate or more detailed responses. Understandably, this capability challenge was more apparent among focus group participants with little or no prior experience with generative AI. These participants acknowledged that they did not know how to prompt effectively and the usefulness of Copilot was diminished as a result.
The capability to derive benefits from Copilot were further challenged by differing Copilot information needs across Microsoft products. For example, while Excel requires data to be inputted in tables for Copilot to recognise the inputs, conversely it cannot recognise data included tables in Word.
Focus group participants also remarked on the difficulties in preparing data in Excel for effective prompting. The learning curve for Excel appears particularly high as trial participants who used Excel noted that it often responded to prompts with a message that it could not complete the requested action.
As such, the learning curve to effectively use Copilot is further heightened by not only the need to learn new skills such as prompt engineering but also the need to learn how to use Copilot differently across MS products. Managers trust their teams to verify outputs but lack the ability to identify outputs themselves.
Whilst managers may trust their staff to verify outputs, the majority of managers could not recognise Copilot outputs. Only around 36% of managers in the pulse survey were confident they could consistently identify the difference between outputs produced with Copilot and those produced without.
Figure 15 | Pulse survey responses to 'Are you confident you could recognise the difference between outputs produced with Copilot and those produced without?', by APS classification. Classification inferred from matching respondents in post-use survey. -
Data table for figure 15
Pulse survey responses to 'Are you confident you could recognise the difference between outputs produced with Copilot and those produced without?', by APS classification.
Response Definitely no Probably not Might or might not Probably yes Definitely yes EL1 (n=42) 0% 25% 38% 28% 10% EL2 (n=51) 0% 35% 41% 20% 4% SES (n=20) 0% 30% 45% 20% 5% Overall (n=113) 0% 30% 41% 23% 6% Classification inferred from matching respondents in post-use survey. Totals may amount to less or more than 100% due to rounding.
Off -
-
-
There are concerns that inaccurate or poor verification of Copilot outputs increases the risk of inaccurate policy advice. Given the likelihood that managers cannot identify Copilot-generated outputs, there is a pressing need for staff to critically review content before forwarding it on for approval. The lack of awareness may point towards the need for further training in identifying the hallmarks of AI-generated content or the use of disclaimers when generative AI has been used.
Effective communication ensures clarity of roles, responsibilities and expected behavioural norms
Lack of clarity and communication regarding information security reduced trust in Copilot.
To effectively manage the personal, sensitive and restricted data it is entrusted with, the APS has strict data and information sharing and storage requirements. Software providers are required to meet these expectations to ensure data and information remains secure. Many generative AI tools house data in America, which does not align with on-shore data storage requirements in the APS.
While Microsoft had provided assurance that Australian user data is housed in Australia and is not used for training models, it appears that trial participants still had varying degrees of understanding and confidence regarding the safety of data and information inputted into Copilot. While some focus group participants remarked that their agency banned the use of sensitive information in Copilot, others noted their agencies were confident in Copilot’s data security arrangements and allowed sensitive information to be used.
Future implementation efforts should provide clarity around information and data security, as well as protocols and guidance as to what information is inputted into generative AI tools.
There may be negative stigmas and ethical concerns associated with using generative AI.
Agencies provided different levels of guidance and encouragement around Copilot and generative AI use more broadly. As a result, there are likely to be differing degrees of openness with using generative AI across organisations.
-
Our positive culture around use came from the top
Trial participant in a focus group -
Focus group participants expressed a variety of views in relation to their agency’s openness in using generative AI in their work. Three focus group participants voiced concerns about a perceived stigma or negative reaction if they openly acknowledged their Copilot use. The stigma originated from a belief that Copilot negated the need for staff to use their own critical thinking skills – thereby suggesting that Copilot encouraged laziness.
In comparison, in agencies where leaders actively encouraged and expected Copilot use, focus group participants reported positive reactions towards Copilot and uptake. Two focus group participants from different agencies reflected that publicly communicating ways that senior leaders effectively used Copilot drove positive sentiment. For example, one focus group participant from the DTA mentioned that their chief executive officer (CEO) showcased how he had been using generative AI, which led to a perceived uptick in usage within the agency.
Trial participants working in policy roles voiced concerns regarding the use of Copilot in producing policy advice, believing that this function should be solely human-led. These trial participants viewed the use of AI to support the development of policy could lead to a deterioration of the trust and confidence of the public and create broader ethical issues.
-
Most of my work is delivering outward facing policy documents and positions. I struggle to use Copilot in my day-to-day work as it is a conflict and an ethics issue to have government policy positions created by AI.
Trial participant from the policy job family, pre-use survey
Connect with the digital community
Share, build or learn digital experience and skills with training and events, and collaborate with peers across government.