Author: James Lappin

Records management consultant and trainer

Records management and/or information governance

Last week Laurence Hart published a blog post in which he stated:

”I’ve been talking a lot about information governance of late. The reason I’ve been doing it is because if it simply becomes a term used in place of Records Management we will have wasted an opportunity. Information Governance is different. It needs to be different.

Records management failed.  We need a new approach. Information Governance has the potential to be that new approach, if we tackle it correctly.   If we get lazy, we will be fighting the same battles for another decade.”

Laurence’s post should prompt us to explore the distinction between the terms ‘records management’ and ‘information governance’.  They are both names we give to approaches to a particular problem that organisations face.  The best place to start comparing them is by defining exactly what that problem is:

  • every day there is a massive flow of written communications into, out of, and around every organisation.  Thirty years ago they moved around in envelopes.  Now most of them move around as e-mails or in attachments to e-mails.   But the flow has continued uninterrupted.
  • what we call ‘records’ are simply these written communications when they are at rest.
  • the organisation is accountable for each piece of work it undertakes. It needs to find a way of assigning each significant communication to the piece(s) of work that it arose from. Thirty years ago this would have been done by placing communications onto paper files that represented each piece of work.  Now there are lots of different ways this could be done.  We could ask users to assign them to electronic files, or set up workflows to move the communications to the right file, or define routing rules, or set up e-mail accounts dedicated to particular pieces or areas of work, or train an auto-classification engine
  • if the organisation does not assign each significant communications to the piece of work that it arose from that then it will not be able to assign an appropriate access rule or retention rule to those communications.  Nor will it be able (at least with any degree of certainty) to present colleagues or stakeholders in those pieces of work with a complete collection of documentation of that piece of work
  • any method the organisation uses to manage its records needs to be built into the communications flow itself.  If it is not part of the flow it will not be able to cope with the volume of communications exchanged.  This was true in the paper world, and is even more true now
  • once assigned to a piece of work, a communication should be protected from amendment or deletion for the length of time that the organisation needs a record of that piece of work
  • not all communications are significant, many are trivial or ephemeral or irrelevant to the organisation’s responsibilities or purpose.  These communications do not need to be assigned to a piece of work, and do not need to be protected from alteration or deletion during a retention period.  They can be scheduled for deletion after a convenient interval
  • whatever method the organisation chooses to use to filter out insignificant communications needs to be trusted by both internal and external stakeholders

 

It doesn’t matter what we call the next generation of solutions to this problem:

  • we could call it ‘records management’ on the grounds that it is the same old problem that records management has been tackling for fifty years, sometimes successfully, sometimes unsuccessfully
  • we could call it ‘automated records management’ as NARA (the US National Archives) are calling their search for automated solutions to this problem across US federal government
  • we could it ‘information governance’ as it tends to be called amongst vendors, and in the private sector, if we want to put clear water between the largely discredited electronic records management system approach and this newer (but not yet fully articulated) set of approaches

Whatever name ends up sticking to the new approach, one thing we need to remember is that records management is a body of thought, professional practice and experience stretching back half a century.    There is far more to records management than simply the roll out of a particular breed of product that used to be called electronic document and records management systems.   Even as ‘information governance’ tries to distance itself from the electronic records management system approach,  it should endeavour to take on board the lessons the records management profession has learned over the past fifty years.

The corollary of that is that we records managers need to articulate what those lessons are:

  • What we can we learn from the relative success of records management approaches in dealing with the communications flow in the paper age?
  • What we can we learn from the failure of electronic document and records management systems (and SharePoint) to adequately deal with the digital communications flow?
  • What can we learn from those records management approaches that have worked in the digital age ? (because there are indeed been some pockets of success)

I will have a stab at distilling these learnings in the next few blogposts.   We need to use these learnings to come up with some critical success factors for the next generation of records management/ automated records management / information governance approaches.   And those criteria in turn can be used for us to come up with some alternative scenarios of how such an approach might work.

Auto-classification – will cloud vendors get there first?

It is easy to predict that data analytics, auto-classification and the cloud will have an increasing impact on records management.   The big question is whether these three trends will act separately or in combination.

My guess is that they will have their most powerful effect when they are used in combination – when a cloud provider uses analytics on the content they hold on behalf of a great many different customers to auto-classifiy content for each customer.

Lets look at each of these phenomena separately and then see what happens when they come into combination with each other

Data analytics

Analytics is the detection by machines of patterns in content/data/metadata in order to derive insights.  These insights may be designed for action by either people or machines.

You can run analytics across any size of data set, but it yields more insight when run across large data sets.

When people talk about big data they are really talking about analytics.   Big data is what makes analytics effective.   Analytics is what makes big data worth the cost of keeping.

The use of analytics in records management now

The use of analytics in records management is still in its infancy

There are many tools (Nuix, HP Control Point, Active Navigation etc.) that offer analytics dashboards that can display insights gleaned from crawls of  shared drives, SharePoint, e-mail servers and other organisational system.  They allow an administrator to drill into the index to find duplicates, content under legal hold,  ROT (redundant outdated and trivial documentation), etc..

Records managers and information governance managers are, to the extent that they are using analytics at all, using these tools to deal with legacy data.    Content analytics tools  are put to use to reduce a shared drive in size, or prepare a shared drive for migration, or apply a legal hold across multiple repositories etc.   All worthy and good.   But it means that:

  • we are using  analytics on content with the least potential value (old stuff on shared drives) rather than on content with the most potential value (content created or received today)
  • we are using analytics to reduce the cost of storing unwanted records but not to increase access to, and usage of, valuable content
  • we have a very weak feedback loop for the accuracy of the analytics because the decision on whether something is trivial/private/significant is being made at a point in time when it has little consequence for individuals in the organisation (and hence they will neither notice nor care if a mistake is made)]

Auto-classification

Auto-classification is a sub-set of analytics.    Auto-classification uses algorithms and/or rules to assign a digital object (document/e-mail etc.) to a classification category (or a file/folder/tag) on the basis of its content, or its metadata, or the context of its usage.

Auto-classification is becoming a standard offering in the products of e-mail archive vendors like Recommind, content analytics vendors like Nuix and HP Autonomy, enterprise content management vendors like IBM and Open Text.

There is a real opportunity for auto-classification, but to harness it we need to overcome two barriers:

  • the trust barrier – organisations are currently reluctant to use auto-classification to make decisions that would affect the access and retention on a piece of content.
  • The training barrier – it takes time to train an auto-classification engine to understand the categories you want it to classify content against

Overcoming the trust barrier

Imagine you are a middle manager.   Into your e-mail account every day come two hundred e-mails – some of them are innocuous but others have some sort of sensitivity attached to them.   You do not have a personal assistant.  Do you trust an auto-classification engine to run over your account once a day and assign categories to e-mails which will mean that some of them will become visible or discoverable by colleagues, and others will be sentenced to a relatively short retention period?

I have heard from two different vendors (HP and Nuix) that customers are still reluctant to trust algorithmic auto-classification to make decisions on content that change its access and retention rules.  They report that customers are happier to trust auto-classification where decisions are based on rules (for example ‘if the e-mail was sent from an address that ends in @companyname.com assign it to category y’)  than they are to trust decisions made on an algorithmic reading of content and metadata.   But organisations cannot possibly manually define enough rules for an auto-classification engine to make rule-based decisions on every e-mail received by every person in the organisation.

The whole point of auto-classification is to change the access and retention on content, and particularly on e-mail content.   For example the purpose of applying auto-classsification to e-mail is to:

  • prevent  important e-mails staying locked in one or two people’s in-boxes inaccessible to others in the organisation
  • prevent important e-mails getting destroyed when an organisation deletes an individual’s e-mail account six months/two years/ six years after they leave employment.

The way to increase trust in auto-classification is to increase its:

  • transparency – make visible to individuals how any e-mail/document is being categorised,  who will be able to see it, and why the engine has assigned that classification
  • choice –   give individuals a way of influencing auto-classification decisions – give them warning of the categorisation before it is actioned, and let them reverse, prevent or change the classification
  • consequence - there needs to be some impact from auto-classification decisions in terms of access and retention- -otherwise individuals will not act to prevent or correct categorisation mistakes
  • consistency – nothing breeds trust more surely than predictability and routine
  • use - make sure that the output of auto classification is groupings of e-mails/documents that are referred to/subscribed to/surfaced in search results etc.  This not only means that mistakes are spotted quicker, it also means that the organisation gets collaboration benefits from the auto-classification as well as information governance benefits.

There is a bit a chicken and egg situation here:

  • in order for the organisation to develop confidence in auto-classification it needs end users to be interacting constantly with the auto-classification results, so that there is a feedback loop and so the auto-classification engine can learn from end-users behaviour and reaction…..
  • …but in order to get that level of interaction the auto-classification needs to be dealing with the most current and hence the most risky content.

This means that in order to overcome the trust barrier we also need to overcome the training barrier,  to get the auto-classification engine accurate enough for an organisation to start using it.

Overcoming the training barrier

An auto-classification engine needs to learn the meaning of the categories/tags/folders/ that the organisation wants it to assign content to.   The standard way of doing that at the moment is to prepare a training set of documents for each category.  This is time consuming especially if your classification is very granular.

For auto-classification to be viable across large sectors of the economy it needs to work without an organisation having to find a training document set for every node/category of the classification(s) it uses.

There are two ways of replacing the need for training sets.

The first is to piggy back on the training done in other organisations in the same sector.  So if one UK local authority/US state or county/Canadian Province trains its auto-classification engine against its records classification then in theory this could be used by any other local authority/state/county/province.    This may create an incentive for sectors to arrive at common classifications to be used across the sector.

The second is to use data analytics to bring into play contextual information that is not present in the documents themselves and their metadata.     Ideally an auto-classification engine would have access to analytics information about each individual in the organisation.   It would know:

  • which team they belong to
  • what their team is responsible for
  • what activities (projects/cases/relationships) the team is working on
  • where they habitually store their documents
  • who they habitually  correspond with

It would use this information to narrow down its choice of auto-classification category for each document/e-mail created or received by each individual.

The opportunity here is to use data analytics to support and train an auto-classification engine, and hence eliminate the need for a training document set.  I see know reason why that shouldn’t work, provided that the data sets that the data analytics runs on are big enough and relevant enough.

It follows from this that the vendors whose auto-classification engines will work the best for your organisations are the vendors:

  • with access to data arising from the content of other organisations in the same sector as yours
  • with access to the broadest range of data from your organisation – including e-mail correspondence data,  social networking data, search log data, and content analytics data from document repositories such as SharePoint and shared drives

Which category of vendor will have access to all of this data?  Cloud vendors.

The cloud is a game changer

I met Cheryl McKinnon at the IRMS2014 conference last week.  She told me that there is a cloud based e-mail archive service called ZL who advise their clients not to delete even trivial e-mails on the grounds that the data analytics runs better with a complete set of e-mails.

What analytics usage could you possibly want trivial e-mails for?   The example Cheryl gave was of a company wanting to be able to predict whether new sales staff will be high performing or not.   It might run analytics on the e-mail of sales staff to surface communication patterns that correlate with high performance.  It could then run the algorithm over the  e-mail correspondence of new staff to see whether they are exhibiting such a communication pattern.  And  trivial e-mails may be just as good an indicator of such patterns as important e-mails.

Data analytics is becoming all pervasive.  Its use will affect every walk of life.  This means that more and more data will be kept to feed the analytics.   The all-pervasive nature of data analytics means that  both cloud vendors (in this case ZL) and their clients have an interest in keeping data that would otherwise have been valueless.

Cloud vendors will acquire more and more data from inside more and more organisations.   This potentially gives them the ability to train and refine content analytics across a wide spread of organisations, and provide auto-classification as part of their cloud service back to their organisational clients.

We can predict that:

  • the relationship between an organisation and its cloud vendor will be completely different than the relationship between an organisation and its on-premise vendor
  • the nature of cloud vendors will be different from the nature of on-plremise vendors – for example Microsoft the provider of Office 365 will behave in a completely different way than Microsoft the vendor of on-premise SharePoint and Exchange

Lets think about Microsoft’s strategy.  They have:

  • the leading on-premise e-mail storage software (MS Exchange)
  • the leading on-premise document storage software (MS SharePoint)
  • the leading on-premise productivity suite (MS Office).

In the on-premise world they kept  these products  separate.  In their cloud offering they have combined them into one (Office 365), and are charging less for the combined package than you might have predicted they would charge for any of the three on its own.    They have also announced plans  for integrating enterprise social into Office 365 through ‘codename Oslo’ .   This will use  analytics data on who each individual interacts with to present personalised feeds of content (Microsoft call this the ‘Office graph’ in a nod to Facebook’s ‘social graph’) .

What do Microsoft’s actions tell us?  They tell us that their business model for Office 365 is different from their business model for their on-premise software:

  • In the on-premise world Microsoft  wanted to upsell – by getting existing customers to buy more and more different software packages from them.  Each of their software products had its own distinct brand.
  • In the cloud world Microsoft wants to give customers all of their core products, so that they get the most content and hence the most analytics data from each customer.  They are even prepared to deprecate a brand name like ‘SharePoint’  in favour of a single ‘Office 365′ brand for their cloud package.

How long will it be before Microsoft uses the analytics data it will have gained from across their many customers, to start enhancing metadata, enhancing search, and auto-classifying content for each customer?

The questions this poses for NARA

The US National Archives (NARA)  has recently put out for consultation their automated electronic records management report .  The report is part of the mandate given by them by the Presidential Records Management Directive  to find ways to help the US federal government automate records management.

NARA’s report gives a good description of autocategorisation, although it is based on the assumption that the autocategorisation engine needs training packages in order to work.   It acknowledges that:

‘the required investment [in autocategorisation] may not be within reach of the smallest agencies, though hosted or subscription services may bring them within reach for many’ (page 13)

NARA is acknowledging here that cloud vendors are more likely to bring auto-classification to many agencies than they are to develop the capability themselves.   This poses some very fundamental questions:

  • Would the federal government be happy to let a cloud vendor such as Microsoft  use data analytics to auto-classify  federal e-mails and documents? OR
  • Would they rather each individual federal agency develops their own capability ? OR
  • Do they think federal agencies need to club together to create a pan-government capability?

The security and information governance  issues this question raises are massive.

  • From a security and an information governance point of view the option of each agency having an individual analytics capability is clearly the best, because the cloud option and the pan-administration option create too large a concentration of data and insight about the US federal administration.
  • But from a big data/data analytics point of view the  pan-administration option or the cloud option is better, because they give a bigger base of data on which to make better auto-classification decisions.

 

The Ontario gas plant records deletion saga – a records management case study

 

The records deletion controversy in Ontario is of relevance to archivists and records managers elsewhere in the world because of the stark contrast it poses between on the one hand a very strong and complete records management governance framework:

  •  Ontario has a relatively recent piece of Archives legislation (the Archives and Recordkeeping Act 2006)
  •  Ontario has a comprehensive set of records retention schedules, all signed by the Archivist of Ontario and backed up by the Archives and Recordkeeping Act
  • one of those retention rules states that ministerial correspondence (correspondence arising from the portfolio responsibilities of a minster) should be preserved permanently

…and on the other hand:

  • the lack of any  planning as to how this retention rule on ministerial correspondence could be applied in a situation where the correspondence accumulated in the individual e-mail accounts of political staff working in ministerial offices
  • the ability of political staff to delete e-mails (whether trivial or important) from their e-mail accounts should they wish to do so
  • the operation by the Ontarian government of a routine policy of deleting entire e-mail accounts when staff leave

This tension between recordkeeping policy and e-mail practice is not unique to the government of Ontario, it is a universal problem, facing all administrations.

The US National Archives (NARA) took the step in August 2013 of issuing advice to US federal agencies that the e-mail accounts of important officials should be preserved permanently if the agency cannot find any other reliable way of capturing the significant correspondence of those individuals.   This advice is contained in bulletin 2013 -02, and has gone under the name of the ‘Capstone’ approach.

It will be interesting to see whether or not other National Archives around the world follow NARA’s lead and  intervene in the way that the e-mail accounts of important officials are managed.

The slidepack embedded in this post is a collection of all the episodes of the Ontario gas plant records deletion saga comic strip that I have published on this blog (together with a few extra slides that I have added in the middle and at the end).

The slidepack goes under a creative commons licence, so feel free to use it for non-commercial purposes.  My intention is that it serves as a case study. Accompanying the slidepack are:

  • a records guru podcast that I recorded with Jon Garde in which we discuss the saga
  • a blogpost in which I give a recordkeeping perspective on the saga

 

The Ontario gas plant cancellation records deletion saga from a recordkeeping perspective

1.     Introduction

The Ontario gas plant cancellation records deletion saga has occupied a considerable amount of column inches and radio and TV time in the province itself, in Canada, and beyond since the spring of 2013.   Little or none of this debate has been informed by a recordkeeping perspective, despite the fact that the deletion controversy started with an allegation that Ontario’s Archives and Recordkeeping Act had been breached by Craig MacLennan, the former Chief of Staff to the Minister of Energy.

This post attempts to provide a record keeping perspective on the saga.

It argues that:

  • the deletions that have caused most of the controversy in the saga are less damaging in recordkeeping terms than a form of deletion that has so far caused little or no controversy – the IT policy of deleting staff email accounts when a member of staff leaves employment.
  • most of the debate in Ontario’s Parliament has concerned the behaviours and motivations of individual political staff who work or who did work in the Office of the Minister of Energy and the Office of the Premier.     Of much more interest from a recordkeeping perspective is the question of why the record keeping systems in place were not robust enough to capture and protect an adequate record of the correspondence of the ministers involved in the gas plant cancellation decisions.

The post goes on to give recommendations as to how the Ontario Government (or any government) could prevent the recurrence  of a similar saga.

To accompany this post I have recorded a records guru podcast in which I discuss the saga with Jon Garde (listen to it here)

2.     The different types of deletion involved in the saga

The saga has involved three different types of deletion:

  • the confession by  Craig MacLennan in April 2013 that he routinely and indiscriminately deleted e-mails from his e-mail account whilst serving as chief of staff to the Ministry of Energy
  • the government’s  IT policy of deleting e-mail accounts of staff members when they leave Ontario’s public service
  • the alleged attempts by staff working in the Office of the Premier, to wipe clean the hard drives of the computers used by colleagues who were leaving their posts during the transition from Premier McGuinty to Premier Wynne  in the autumn of 2012

Which one of these types of deletion is the most important depends upon whether you are looking at this from a political perspective, a police perspective, an access to information perspective, or a recordkeeping perspective.

3.     The routine deletion of e-mails from his  e-mail account by Craig MacLennan

3.1        Craig MacLennan’s confession

In April 2013 Craig MacLennan, former chief of staff to the Minister of Energy, was asked why he had not provided any correspondence to the Estimates Committee of Ontario’s Parliament in response to their request in the spring of 2012 to see correspondence related to a decision to cancel and relocate two gas plants.   He replied that at the time of the request he did not have any such correspondence.   The reason for this was that he kept a clean in-box and routinely deleted e-mails from his in-box and sent items as he went along, in order to keep within limits he thought had been set by IT.

3.2     Did the deletion actually happen?

The first thing to be said about Craig MacLennan’s confession of e-mail deletion is that we do not know whether or not such deletion actually happened!

The policy of the government of Ontario is to delete e-mail accounts when a member of staff leaves.  Craig MacLennan left the public service in June 2012.

The e-mail accounts of most of Ontario’s public servants are split over two tiers of storage.   E-mails less than 30 days old are kept on first tier storage  in Microsoft Exchange.  When an e-mail is thirty days old it moves to the portion of the e-mail account kept on cheaper, slower hardware, within an e-mail archive (Enterprise Vault by Symantec).

The Ministry of Government Services (MGS) stated that the e-mail archive existed purely to save storage costs, not to protect or preserve e-mail.   Whether a particular e-mail was stored in Exchange or in Enterprise Vault should not have made any difference to how long the e-mail was kept or how safe it was from deletion.   An individual member of staff could delete e-mail from either portion of their e-mail account.

In the summer of 2013 MGS forensic staff discovered that for a period of time IT staff had, by an administrative oversight, omitted to delete the Enterprise Vault portion of the e-mail accounts of staff who had left.  During the period when the policy was not applied 30,000 people had left,  so the Ministry was left with 30,000 orphaned accounts (orphaned because the only way of navigating to these accounts was through Microsoft Exchange, but the Exchange portion of the accounts had been deleted, leaving the Enterprise Vault portions of the e-mail accounts invisible and inaccessible).

The forensic staff checked through these 30,000 orphaned accounts and found that one of them belonged to Craig MacLennan.   When they opened MacLennan’s account they discovered it contained 38,000 e-mails including 1,900 relating to the gas plant controversy.

If MacLennan routinely deleted his e-mails as he went along, as he claimed; and if it was possible for an individual to use their Outlook e-mail client to delete an e-mail stored on second tier Enterprise Vault storage; then why were there there 38,000 e-mails left in the Enterprise Vault portion of MacLennan’s account after he left?

There are two alternative possible explanations for this,  either Craig MacLennan routinely deleted e-mails from his Outlook client, but did so in a way that did not cause those e-mails to be deleted from the Enterprise Vault e-mail archive.   Or  Craig MacLennan did not routinely delete e-mails at all.

3.3 Possible explanation 1:  Craig MacLennan deleted the e-mails from his Outlook e-mail client but did so in a way that did not cause them to be deleted from the Enterprise Vault archive

 In July 2012, a month after MacLennan left employment,  the Ontario government upgraded their e-mail server from Microsoft Excange 2003 to Exchange 2010.

  • Prior to the upgrade (and therefore during MacLennan’s spell of employment) it was possible for an individual to use their Outlook e-mail client to delete an e-mail over thirty days old from the Enterprise Vault  archive – but to do so they would have to use the Enterprise Vault tool bar that exists as a plug-in within their Outlook client.
  • Since the upgrade to Exchange 2010 (but after MacLennan left employment) it has been easier for an individual to use their Outlook e-mail client to delete e-mails from the Enterprise Vault archive.  There is still the option of using the Enterprise Vault toolbar within Outlook to delete.  But there is also now the option of simply selecting the e-mail within Outlook and pressing the delete key.

(This information is gleaned from the information given in the table, and in the footnote to the table,  provided on page 11 of the Information and Privacy Commissioner’s ADDENDUM to Deleting Accountability: Records Management Practices of Political Staff A Special Investigation Report)

3.4    Possible explanation 2:  Craig MacLennan did not routinely delete e-mails at all

It is possible that MacLennan might have preferred to be thought guilty of an inadvertent, non-malicious  breach of the Archives and Recordkeeping Act (which carries no penalties) than to be thought guilty of not producing records in response to a request from a Parliamentary Committee.    If MacLennan knew that the Minisitry of Government Services deleted e-mail accounts when staff leave, then he may have supposed that there would be no way of contradicting his claim to have routinely deleted his e-mail.

4.     The deletion of e-mail accounts when staff leave

4.1   The deletion of e-mail accounts when staff leave makes the question of whether or not MacLennan deleted e-mails from his account academic

Craig MacLennan’s reported deletions were:

  • of interest to opposition politicians hoping to prove that there was a conspiracy of political staff serving the Liberal administration to delete records of the gas plant cancellations AND
  • of interest to the Information and Privacy Commissioner from an access to information perspective because the routine deletion was the reason given by MacLennan for his non production of gas plant cancellation correspondence to Parliament.

But from a recordkeeping point of view the question of whether or not MacLennan deleted e-mails from his account  is academic. Even if MacLennan had kept every single one of his e-mails it would not have helped the Archivst of Ontario because the policy of Ontario’s  Ministry of Government Services was to delete staff e-mail accounts when the staff member leaves.

The discovery by forensic staff of the lion’s share of MacLennan’s account in an orphaned Enterprise Vault account does not mean that this correspondence was safe for the duration of the relevant retention rule.   The Ministry of Government Services had decided prior to the investigation to delete the orphaned accounts, in accordance with their policy, once an upgrade to the Enterprise Vault software had taken place.   They have since placed a stay of execution against e-mail accounts relevant to the gas plant saga.

4.2      The contradiction between Ontario’s retention rule on ministerial correspondence and its disposition policy on staff e-mail accounts

The retention schedule for ministerial public records signed by the Archivist of  Ontario states that ministerial correspondence should be transferred to the Archives of Ontario  for permanent preservation after five years (or upon a change of administration).

MacLennan stated that he never filed any e-mails anywhere.     This means that the only place ministerial correspondence of the Minister of Energy would be captured would be in the e-mail accounts of MacLennan and his colleagues (apart from copies scattered amongst the e-mail accounts of senders/recipients).    This in turn means that the e-mail accounts of Craig MacLennan and his colleagues should be of interest to the Archivist of Ontario.

So what policy should Ontario apply to the e-mail accounts of political staff:

  • the retention rule applying to ministerial correspondence that must make up a significant proportion (though not the entirety) of those e-mail accounts?  OR
  • the IT disposition policy of the Ministry of Government Services that e-mail accounts should be deleted when a member of staff leaves?

The only way those two policies could logically co-exist together would be if some sort of filing of e-mails, whether paper or electronic, was taking place.  No such practice existed in the Office of the Minister of Energy.

4.3   The impact of a blanket policy of deleting e-mail accounts when staff leave

From a recordkeeping perspective the policy of deleting e-mail accounts of all staff when they leave employment, however significant their role in public life,  is the most damaging of the three types of deletion described in this post.   It is this policy that most undermines the accountability of political staff and Ministers for their actions, and most undermines the effort to retain a record of the activities of Ministers and their staff over time.

This deletion of e-mail accounts when staff leave was not questioned by the Information and Privacy Commissioner in her report Deleting Accountability.  Nor has it been condemned by the Parliamentary Committees.  The politicians in the Committees have not  criticised this policy because it is a non-partisan policy – the Ministry of Government Services would have applied this policy regardless of the political complexion of the administration.

Political staff work in a high pace, dynamic environment.  Turnover of staff is relatively high.  They neither expect nor receive security of tenure. They are well placed to secure external jobs because of the value of their connections inside Government.   If a member of political staff thinks that the correspondence in their e-mail account would be damaging to themselves and/or the Minister then they can have that record expunged simply by leaving their employment.

5.     Attempts by political staff in the office of the premier to wipe clean hard drives    

The third deletion in the scandal came to light whilst the Information and Privacy Commissioner was conducting her investigation into Craig MacLennan’s reported routine deletion of his e-mail.  The Cabinet Secretary  told her that he had been approached by David Livingston, chief of staff to the  Premier of Ontario, at the time of the change of Premier from Dale McGuinty to Katharine Wynne.   Livingston wanted to know how to get administrator passwords to wipe the hard drives of the computers of departing political staff.

The cabinet secretary referred him onto the Chief Information Officer who was not too concerned about the proposed deletion, on the grounds that it was good practice to wipe clean devices when they were handed on, provided that the Office had complied with its obligations under the Archives and Recordkeeping Act.  He pointed out to Livingston that the Office already possessed the necessary adminstrative passwords.

Early in 2014  Ontario’s Police obtained a search warrant for the off-site storage vendor where the computers in question were being stored.  It is alleged that David Livingston, shortly after his approach to the Cabinet Secretary  had given the administrator passwords to the boyfriend of a member of the political staff, and asked the boyfriend to wipe the hard drives.   The boyfriend was not an Ontarian public servant.

At the time of writing no charges have been laid, no allegations have been proven in court and Livingston’s lawyer has denied any wrongdoing on his client’s behalf.

This deletion is important from a political perspective because it can be interpreted as showing that political staff were prepared to go to considerable lengths to delete records.   It is important to the police because criminal charges may be pressed.  But it is of little or no interest to an archivist or a records manager.     It is hard to believe that political staff were routinely using the hard drive of their computers for record storage.  It is too vulnerable to device failure, and does not generally support any form of mobile or remote access.    One presumes that the wiping of the drives was simply an attempt to defeat any forensic searches that might be made.

6 What should the Government of Ontario do to stop this type of saga recurring?

Most of the public debate on this saga has concentrated on the motivations and behaviour of individual political staff such as Craig MacLennan and David Livingston.  However from a recordkeeping perspective the question of whether or not the actions of these individuals were appropriate is of little importance..

Administrations will come and go, individual political staff will come and go.  Given the confrontational nature of the environment in which they work, we can assume that from time to time it will be in the interests of a member of a political staff to remove correspondence from the record.   Some people will succumb to that temptation, others will not.

From a recordkeeping point of view the most important question is this:

  • how can we best set up systems to routinely capture records of ministerial correspondence in ways that make it difficult for a member of political staff to remove correspondence from the record, or to prevent correspondence being captured onto the record in the first place?

Option 1 – make it as easy as possible for political staff to electronically file e-mails outside of their e-mail account 

One option would be to set up some sort of electronic document management system with e-mail integration so that political staff could simply drag and drop e-mail into folders within their Outlook client.   The folders could either be:

  • big buckets such as ‘ministerial correspondence’, ‘political correspondence’, ‘private and personal’, ‘trivia and ephemera.
  • or a more granular filing structure covering the themes and matters that the staff are dealing with

Such a solution would be an improvement on what they have at the moment (where it appears there is no simple means for staff to file e-mail).   It would have the advantage that once an e-mail had been dragged to a folder linked to a document management system it could be protected from subsequent deletion.

The weakness of  such a solution is that it is too dependent on the motivation and workload of the political staff themselves.  It leaves it down to political staff to decide what goes onto the record and what stays off the record.  This may be acceptable in a high trust environment.   However political staff operate in a low trust environment.   They are accountable to opposition politicians in the Parliament who do not and will never trust them.    Leaving political staff to decide what does and does not go onto the record is problematic, and I would not recommend this option.

The Ontarian government should aim to institute routines for capturing ministerial correspondence that will be trusted by opposition politicians even where they have no trust whatsoever in the individual politcal staff concerned.

Option 2 – automatically archive and preserve all e-mails sent and received by important political staff

The second option is to change the settings on the Enterprise Vault e-mail archive so that for designated members of political staff a copy of all e-mails sent and received is captured in the archive and protected from deletion.  This would mean disabling the ability of such individuals to use their Outlook client to delete an e-mail in the Enterprise Vault archive.

The precedent for this option is the rulings of the US Securities and Exchange Commission that all electronic communications of all broker-traders must be archived and protected.    Barclay T.Blair said in this post that these rulings (SEC 17 a-3 and SEC 17 a-4) ‘single-handedly created the e-mail archiving industry’.

There are strong parallels between the situations of  political staff and of broker-traders.  Both sets of people work in low trust, high scrutiny environments where there might be a powerful incentive to delete a communication from the record, or ensure that a communication did not go on the record.

However there is a key difference.  Unlike the correspondence of traders, the ministerial correspondence of political staff is needed for long term preservation in an historical archive.  But Ontario’s archivist has no legal right to archive the political correspondence of ministers, only their ministerial correspondence:

  • political correspondence is defined as correspondence arising from the political career of the Minister (for example his or her relations with their constituents, with their political party, and their election campaigning).   Ministers are free to dispose of such correspondence as they see fit.
  • ministerial correspondence comprises all correspondence arising from the Minister’s portfolio responsibilities and their role as a member of the cabinet

If Ontario archived and protected every e-mail sent or received by important political staff then they would have to find a way to sift out the political correspondence at the point in time at which the records are due to be transferred to the Archives of Ontario.

The Ontario government should institute a routine, trusted and simple method for political staff to separate out political correspondence from ministerial correspondence.

Recommended option – protect and preserve the e-mail accounts of important political staff –   but give them a means of flagging up  political correspondence and private correspondence

To stop this type of saga recurring I would recommend that the Government of Ontario take the following measures:

  • Designate the roles of certain political staff as being of high importance.  Ensure that all e-mails sent or received by such individuals are captured into an e-mail archive.  Protect the e-mails from deletion or amendment .  Disable the ability of individual staff to use their e-mail client to delete e-mails from the archive, or to amend those e-mails.
  • Find a means by which individuals whose accounts have been designated as of being of high importance can flag certain e-mails as private or personal.   Institute a random system of auditing to ensure that individuals are applying such a flag properly
  • Find a means by which political staff can flag certain e-mails as being political correspondence rather than ministerial correspondence. Institute a random system of auditing to ensure that individuals are applying such a flag properly
  • Retain e-mail accounts designated as being of high importance permanently.   Remove correspondence flagged as personal, and correspondence flagged as being political rather than ministerial, at the point in time at which the account is transferred to the Archives of Ontario.

The core components of the new generation of records management/information governance tools

In my last post I drew a distinction between two generations of records management tools:

  • The first generation of tools are those that hit the market between 1997 and 2009 and we called them electronic document and records management (EDRM) systems
  • The second generation are those that hit the market after 2009 and we seem to be calling them information governance tools

In this post I will look again at this distinction – this time comparing the components and capabilities of the old EDRM systems with the components and capabilities of the newer information governance tools.

The core components of the first generation of records management tools (EDRM systems)

The first generation of tools consisted of six core components/capabilities:

  • an end- user interface  to allow end-users to directly upload documents to the system
  • an integration with the e-mail client (usually Outlook) to allow end-users to drag and drop e-mails into folders within the system
  • document management features:  such as version control, check-in and check out, generic workflows and configurable workflow capabilities
  • a repository:  to store any type of documentation that the organisation might produce
  • classification and retention rules:  capability to hold, link together and apply a records classification (business classification scheme) and a set of retention rules
  • records protection – capability to protect records from amendment and deletion and maintain an audit trail of events in the life of that record

When implementing  such EDRM systems the records managers drew a ‘line in the sand’.  They aimed to implement  a system that would manage records going forward in time.  They did not attempt to deal with legacy content that had already accumulated on shared drives and in email.

The weakness of EDRM systems was that end users did not move all or most significant content into the records system.  Shared drives and e-mails continued to grow and continued to contain important content not captured into the records system.

Added to this a range of disruptions happened:

  • Microsoft’s entry into the content management space with SharePoint 2007 took away the collaboration space from the EDRM systems.   Unless they had complex requirements, organisations with SharePoint no longer needed the version control, check-in check out or workflow capabilities of the EDRM tools.
  • E- discovery/freedom of information/subject access enquiries caused more and more pain to organisations, and tended to focus on material in e-mail and shared drives rather than content in the EDRM
  • The move to smart phones and tablets made the user-interface problematic – smartphones have screens that are too small for the full functionality of an EDRM end-user interface.
  • The move to the cloud made e-mail integration problematic – cloud e-mail services do not allow customisation of their user-interface.

The seven core components of the new generation of records management/information governance tools

The second generation of records management tools, which we are calling information governance tools, consists of seven key capabilities:

  • Indexing engine  the ability to crawl and index content in many different applications and repositories (shared drives, SharePoint, e-mail servers, line of business systems etc)
  • Connectors  a set of connectors to the most common applications and repositories in use in organisations today (SharePoint, Exchange, ECM/EDRM systems etc).   The connectors enable the records system to take action on content in a target repository – for example to delete, move or place a legal hold on it.  They also enable the crawler to extract context to index.
  • Metadata enhancement and auto-classification the ability to add, through the connectors, extra metadata fields to content, and the ability to assign content to a classification either by setting rules based on parameters, or by using auto classification algorithms
  • Analytics dashboard to surface patterns in content repositories, for example to identify duplication, redundancy, trivia and high risk content
  • Classification and retention capability to hold and apply a records classification and a set of retention rules   – this is the main point of continuity between the first and second generation of records management tools.
  • In-place records management  the capability to protect records from amendment and deletion, maintain an audit trail of events in the life of that record, and assign a retention and classification rule to the record, even where the record is held in a different application than the records system itself.  From the end-user point of view this has the advantage that they can stay in the applications they are used to work in – they do not have to learn how to use the records system.
  • Repository  a repository to store any type of documentation that the organisation might produce .   The in-place records management features reduce,  but do not eliminate the need for a records repository.  Records repositories are necessary when an organisation wants to decommission an application, but still wants to retain the content from that application.  In cloud scenarios the repository comes in useful when the organisation wants the content to be available via a cloud application but not stored by the cloud provider

Notice what has been taken away and what has been added:

  • The components that an end-user interacted with – the end-user interface and the document management functionality, have either disappeared entirely or become an optional extra.
  • What comes in their place is the connectors,  indexing engine,  analytics and in-place records management capability necessary in order for a central administrator to understand and act on content held outside of the records system itself

 

The importance of the analytics dashboard

The key difference between the new generation of information governance tools and the old generation of EDRM systems is that the information governance tools pay as much (often more) attention to existing content as they do to shaping the way future content will accumulate.

The most stark illustration of the change is this:

  • ten years ago if you saw a system demonstration by a vendor at a records management event they would start by showing you their end-user interface for an individual to upload a document.
  • In 2014 a vendor will start by showing you their analytics dashboard

The analytics dashboard is the key to the new generation of  records management/information governance tools

Without the dashboard having an indexing engine crawling across shared drives, e-mail and SharePoint would be useless to the records manager.

The dashboard enables the records manager to actively interrogate the index to hone in on targets for action – information that should be deleted/moved/protected/classified/assigned to a retention rule etc.

392-analytics 1

A typical dashboard shows the records manager  how much content is held. where it is held, what file types there are, what departments it belong to,  what is redundant/outdated/trivial etc.   The dashboard also enables the records manager to use these different dimensions in connection with each other – for example to hone in on content of a particular department in a particular time period.

These are powerful tools in the hands of a central administrator, and it is important that they have workflows and audit trails in them so that:

  • the records manager can get the approval of content owners before making disposal decisions on content
  • the system can record that approval, and record the actioning of the decision

Note however that these tools are more effective at helping records managers make decisions on content that has build up in the shared drive and SharePoint environment than they are at dealing with content that has built up in e-mail accounts.

One of the challenges with EDRM systems was that it was very hard to measure benefit and give a tangible ROI.   The  business case for the new infromation governance tools often arises from savings produced by dealing with legacy data – something that the EDRM systems were not set up to do.  The ROI might come from:

  • savings from storage optimisation (moving less active content to second or third tier storage)
  • savings from reduction of content that has to be reviewed for eDiscovery/access to information requests

The benefits might be

  • capability to move content from legacy applications
  • capability to process the shared drives of functions acquired or divested in mergers and acquisitions

At the ARMA Europe conference last month Richard Hale from Active Navigation and Lee Meyrick from Nuix both gave presentations urging records professionals to be pragmatic and concentrate on targeting particular improvements one at a time.  The dashboard suits that approach – gone is the utopian wish to create a perfect records system, instead we have an incremental approach whereby a central administrators hones in on particular areas of content for protection/enhancement/migration.

The strengths and weaknesses of the information governance approach to records management

It is clear that a new records management approach is emerging.   We can see the signs:

  • a new set of tools has emerged that seek to enable organisations to mitigate and adapt to the the fact that their records are largely held in places such as e-mail accounts, shared drives and SharePoint team sites that are difficult for the organisation to manage.    In contrast the previous records management approach (the electronic records management approach) sought to move records out of these repositories into an environment that was easy for the organisation to manage – a standards compliant electronic records management system
  • a new set of beliefs are forming about records - the belief that the burden of records management on end-users should be minimised. The belief that end-users cannot be relied upon to make consistent decisions on whether or not particular documents or e-mails needs to be captured as a record.  The belief that organisations should no longer try to distinguish between records and non-records, records systems and non-records systems, but should instead realise that all content they hold needs to be managed and accounted for.
  • new guidance has been issued from influential bodies - the US National Archives has told federal agencies to designate the e-mail accounts of key staff for permanent preservation if they have not been able to find an alternative way of routinely capturing important e-mails as records.   It has asked agencies to innovate and to look for new ways of automating records management to reduce the burden on end users.  It has asked vendors for their ideas and help for ways of automating records management

The best way of understanding the new approach is to compare it with what has gone before.   In the history of records management  we have had periods where the profession has had a coherent approach to offer organisations, interspersed with periods of disruption during which technological and communications developments have made an existing approach untenable:

  • The registry approach  (1950s to early 1990s) In the paper age the registry approach involved employing teams of records clerks (grouped into ‘registries’) to maintain files and place correspondence and documentation onto those files.     This meant in effect that an organisation of 1,000 people that wanted good records management across all its activities would deploy around 40 people to capture all the records of the organisation.  Post would arrive in a post room and then it would be sent to a records registry for filing before being delivered to the individual addressee for action.
  • The disruption of the arrival of e-mail (1993 to 1999)   The arrival of e-mail meant it was no longer possible to route the flow of incoming and outgoing correspondence through registries – instead correspondence went directly from sender to recipient with no space for intermediaries
  • The  electronic records management system approach (2000 to 2007).  With this approach an organisation asked every individual within the organisation to declare every important e-mail/document that they created or received to the electronic records management system.   This in effect means that an organisation of 1,000 people was asking all 1,000 people to take decisions on what gets captured as a record and where it is placed within the records classification/filing structure.  The  benefit was that every document declared into those systems were well described and well governed.  The problem was that records capture was no-longer routine and no longer integrated into the process by which people exchanged written communications.  Instead records capture was an after-thought, dependant on the motivation, awareness and workload of individual staff.
  • The disruption of the rise of SharePoint (2008 to 2011)  the rise of SharePoint destroyed the market position or electronic records management systems by capturing the collaborative space, without in itself offering a workable records management alternative.
  • The information governance approach (2012 –  )  in this emerging approach an organisation gives a central information governance or records management unit  tools to index/apply classification and retention rules to / clean up/ the various  applications/repositories that the organisation is using.  This means an organisation of 1,000 people is asking 3 or 4 members of staff to make records organisation and disposition decisions for the whole organisation.

Strengths of the information governance model

Ability to deliver quick wins

The main strength of the information governance approach is that it enables organisation to take pragmatic measures in the short term to tackle key pain points or cost points:

  • If they are paying a fortune keeping their entire shared drive on expensive first tier shortage they can use analytics tools to identify redundant, outdated, trivial or rarely looked at content and either dispose of it or move it to cheaper storage.  Organisations often lease such tools on a short term basis for particular projects/cases, rather than taking out perpetual licences.
  • If organisations are facing problems responding to eDiscovery requests they can deploy (or their eDiscovery service provider can deploy) an indexing engine to index shared drives, e-mail accounts and SharePoint sites; apply legal holds; and support the review process

This is a contrast with the electronic records management system approach which typically took several years before beginning to deliver benefits.

Less need for change management

The second strength of the approach is that it is not dependent for its success on end-users changing their behaviour.   In-place records management tools and SharePoint records management  plug-ins, and to an extent e-mail archive systems, allow the application of records classification and retention rules without end users leaving the applications that they work in.   It isn’t that these tools have no-impact on end-users (they might require an end-user to act when creating a new folder/document library for example), but they have far less impact than the electronic records management systems approach.

The problem with the electronic records management system approach being so dependant on end-users changing their behaviour was not so much the resource implications of the training necessary.    It was the fact that there was no certainty that the change management would succeed.  A significant number of electronic records management system projects were abandoned due to lack of user buy-in.

Possibility of extending, rather than abandoning, electronic records management systems

Another strength of the information governance model is that it enables those organisations that did manage to establish electronic records management systems to extend the reach of those systems to shared drives, e-mail accounts and SharePoint sites.   This could be done by using an analytics/indexing/clean-up/in-place records management tool to move selected content from repositories such as shared drives into the electronic records management system.

Weaknesses of the information governance model

Lack of a fully worked through theory

The information governance model is an emerging model, it is still not fully worked through.  In particular there is no coherent body of theory and guidance yet.   As an illustration of this we saw in 2013 the US National Archives appealing to vendors for ideas on how to automate records management.  This is in stark contrast to the situation at the end of the 1990s when various national archives around the world were specifying to vendors exactly what functionality an electronic records management system should have.

Focus on the compliance requirements of external stakeholders at the expense of the day-to-day needs of internal end-users

If one of the main strengths of the information governance model is the reduction in burden on the end-user, the main weakness of the model is a lack of clear benefit for end-users.

For example:

  • indexing engines (such as those provided by Nuix and Zylab) act in effect like an enterprise search engines, albeit with the additional ability to be able to  take action on content rather than simply find content.  They can be shone across e-mail servers, shared drives, SharePoint, line of business systems etc.   But unlike enterprise search engines these indexing engines are not intended for end-users.  They are intended for central administrators and those charged with dealing with eDiscovery and access-to-information requests.   Most organisations buying indexing engines such as Nuix do not provide end-users with an interface to these products. Providing such an interface is not feasible because the whole point of such tools is that they can be used to search ‘dark data’ – material in e-mail accounts which contain a mixture of harmless,useful, harmful, useless and private content.  This means that indexing engines can be used by administrators/legal counsel to service the needs of external requestors (often hostile to the organisation), but cannot be used to service the day -to-day information needs of internal users who wish, for example, to know what a predecessor had said to a particular stakeholder/customer/client/citizen/lobbyist/regulator.
  • Electronic records management systems aimed to create an electronic ‘file’ that told the whole story of a piece of work (much like good paper files used to), and that functioned as a single point of reference for that piece of work.  In other words they were trying to ‘shape’ the way records accumulated in a way that was useful (or was thought to be useful) to both the individuals carrying out that work and any future stakeholders.  In contrast in-place records management tools attempt to apply policies (classifications and linked retention and perhaps access rules) to content held in different repositories. They are not looking to shape the way records accumulate.   They are not looking to create a single source of reference for a particular activity.  They are instead looking to make sure that the organisation can apply an appropriate classification and retention rule to all content.

The lack of focus on the information needs of internal end-users is not surprising.  This is an information governance approach that is being adopted as a records management approach.   Information governance and records management are different professional perspectives, with different histories and aims.  Neither can be reduced to the other without giving up some of its core aspirations.  The aspiration that records management is giving up if it takes the information governance approach as it is, without adding to it or reshaping it, is the aspiration to design records systems that are equally useful  for the day-to-day needs of internal end users as they are for compliance with the requirements of external stakeholders.   The information governance model, as it stands at the moment, has a strong emphasis on the latter to the neglect of the former.

The Ontario e-mail deletion scandal – part 10 – Why had no-one mentioned the e-mail archive?

The story so far….
In September 2011, just before a general election, Ontario’s Minister of Energy announced the cancellation and relocation of a controversial gas plant.

In May 2012 the Estimates Committee of the Parliament of Ontario requested to see the correspondence of the decision.  They received no correspondence from any of the political staff working in  the Office of the Ministry of Energy, nor from those working in the Office of the Premier of Ontario.

Craig MacLennan was Chief of Staff to the Minister of Energy when the gas plant decision was taken.  He left his post and Ontario’s public service in August 2012.

In April 2013 MacLennan was questioned as to why he had not returned any records responsive to the Estimates committee’s request.  He said that he had been unable to return any responsive records because he kept ‘a clean in-box’ and routinely deleted his e-mails.

MacLennan’s statement was reported to Ann Cavoukian, Ontario’s Privacy and Information Commissioner.  She investigated and reported in June 2013 that MacLennan’s e-mails were not recoverable (Ontario’s policy is to delete e-mail accounts when members of staff leave the service).

In July 2013 Cavoukian was contacted by the Ministry of Government Services who told her that a portion of MacLennan’s e-mail account had been found in their ‘Enterprise Vault’ e-mail archive.   This portion of his account comprised 39,000 e-mails of which 1,800 related to the gas plant issue.

Ontario-pt10-01

Ontario-pt10-02

Ontario-pt10-03

Ontario-pt10-04Ontario-pt10-05

Ontario-pt10-06

Ontario-pt10-07Ontario-pt10-08Ontario-pt10-09Ontario-pt10-10

Next episode

The Commissioner issues an addendum to her report