Why were corporate wide records systems in the paper age so much more successful than those in the digital age?

Records managers are often accused of trying to replicate ‘a paper paradigm’ in the digital world.  This is a little ironic. If we were able to implement corporate wide electronic records management systems that were half as good as the best records systems in the paper days then we would be very popular indeed.

Our predecessors in the records management profession 20 years ago  tried to abstract the qualities of the best paper records management systems and express these qualities as a set of technology neutral criteria, in the hope and expectation that we would be able to design electronic records mangaement systems that also met those criteria, even if they met them in a completely different way.

The best statement of these technologically neutral criteria can be found in  section 8.2.2. the ISO 15489 records management standard, (see my last post for a more detailed discussion of them)

The five characteristics are as follows. In order to be considered reliable a records system must:

  •  routinely and comprehensively capture all records arising from the activities that it covers
  • act as the main source of reference for the activities it covers
  • link records to the activities from which they arose
  • protect records from amendment or deletion
  • preserve access to records over time

These characteristics may at first sight seem utopian. No organisation I know of currently operates a corporate wide system that meets all these characteristics and covers all of their activities. And yet in at the time they were drawn up, in the early 1990s,  they seemed anything but idealistic. Before the introduction of e-mail, any organisation that wished to could set up a record systems that met all five of these characteristics.

What made it possible to set up a reliable corporate wide records system in the paper age?

In the paper age there was a gap in time and space between:

  • the point in time at which a business document/communication arrived in the organisation from outside AND
  • the point in time at which that business document/communication arrived in the in-tray of the individual responsible for dealing with that communication

There was also a gap in time and space between

  •  the point at which an individual within the organisation sent a business document/communication out AND
  • the point at which that communication either arrived at the colleague it was addressed to, or left the organisation for dispatch to an external recipient

Organisations could insert control points into those gaps in time and space to ensure that business communications were routinely captured into the records system, and assigned to containers (usually called ‘files’) that each represented an instance of a particular activity.

 

How a registry system typically worked

RegistryModel

The illustration above shows how such a registry system would typically work

Incoming post would arrive in a post room. The post room staff would do a rough filter of the post

  •  Things that looked like they were not needed for the record system because they were trivial, personal, or reference material (promotional material/flyers/postcards/love letters/magazine subscriptions) would be sent direct to the individual concerned
  • Things that looked like business communications (letters, memoranda, reports etc.) would not be sent direct to the addressee, instead they would be sent to the relevant records registry

Each registry was simply a team of records clerks who looked after the files for the area of the organisation within the scope of their registry. Typically an organisation would have several registries, each covering one or more of the organisationsfunctions/departments/buildings, though it was also possible to operate with one central registry covering all activities.

The registry would assign the document to the file representing the activity from which the correspondence arose. They would deliver the file, with the new correspondence on it, to the action officer.

The action officer would draft a reply which would be typed up by a typist in a typing pool. The typist would create two copies for the action officer to sign – one to go on the file, and one to be sent out.

 

Evaluating registry systems against the five criteria for a reliable records system

The registry system described above meets the five reliability criteria for a records system because:

  • There is routine and comprehensive capture into the record system.   In the post room(s) the same staff filter incoming correspondence day after day. They apply the same thought process to post day after day to decide what post goes to which registry, and what post bypasses the registries and gets sent direct to individuals. If the post room acquires a new member of staff, they train that person in the thought process.     Similarly in the registry(ies) the same staff do the filing day after day.   The registry staff have no interest in withholding embarrassing material from the file. The file is not holding them to account, it is holding action officers to account. The files are comprehensive – every incoming  piece of correspondence goes through the post room, and is either filtered out or sent to a registry for filing.
  • The colleagues working on the project/case/relationship/matter use the ‘file’ as their source of reference. If there are gaps in the file a colleague is not only likely to notice, there is also a fair chance that they will be motivated to do something about it, because they rely on that file being complete in order to be able to do their work, and defend their work.
  • A file is set up every time a new piece of work starts and every piece of correspondence placed on the file is, by the act of being placed on a file, connected to the activity it arose from.
  • The registry guarded the files. They typically kept a record of who each file was loaned out to and when. It is true that there was nothing to physically prevent an action officer ordering up a file, removing a paper that was incriminating to them from the file, and returning the file. However they would risk dismissal if detected.
  • Organisations had ways of managing the records lifecycle so that access to records were preserved over time. The registries would store active files close to office space, then send records to a record centre when they became non-active at a point some time after the work had finished From the records centre records would be disposed of at the end of a designated retention schedule either through destruction or through transfer to an historical archive

What happened to these systems when e-mail and networked computers arrived?

On the face of it it seems that the records management and archives profession was in a good position circa 1993 to ensure that no paper registry system was decommissioned without an adequate digital replacement being put in-place. Those organisations that operated such systems tended to proud of the systems, and proud of the records that the systems held. Their records were their support, their defence and their source of reference. They had no plans to jeopardise the quality of their records.

So why did that not transfer?  Why is it that even organsiations that had great records systems could not replicate the quality of those record systems after e-mail?

There are three main reasons to this:

  •  the speed and manner in which an e-mail moves through space and time is so different from that of a piece of paper that even having abstracted the qualities of a good paper records system into a set of criteria it was hard for the profession to imagine a way in which a system in the post e-mail world would meet those criteria
  • individual e-mail accounts collapsed the  time and space between a piece of correspondence arriving in an organisation and it arriving at the desktop of the action officer.    There was no time or space for records management controls to be inserted
  • the rapid and uniform spread of e-mail, through standard e-mail client and server software, meant that organisations denied themselves the opportunity to innovate when they set up their systems for handling e-mail.   A satisfactory method for transparently filtering and classifying/ filing e-mail never emerged because so little experimentation was done.

 

Evaluating automated approaches against records management principles

As a profession we are very proud of our principles.  If you ever discuss technology with a group of records managers one of us is bound to say  ‘records management principles are timeless, regardless of how much or how quickly technology changes’.

But what exactly are these principles?

The time when you most need a set of principles is when new ideas, tools, technologies or approaches come onto being.   By definition we have little or no practical experience of these new ideas/tools/ technologies.  We need a set of principles which distills our past experience of what has and hasn’t worked, in order to predict whether these new approaches are likely to work.

NARA’s report on automated approaches to records management

In March 2014 the  US National Archives (NARA) issued a report on the different ways in which records management could be automated.   The approaches have little in common with each other except that they all aim reduce the burden of records management tasks on end users as compared with the more established electronic records management system approach.

The approaches discussed in the report were extremely varied, but can be boiled down to the following:

  • an in -place approach based on holding a records classification and related retention rules in one application, and applying them to content held in the various different native applications of the organisation (SharePoint, e-mail accounts, shared drives etc.)
  • a workflow approach -  the definition of workflows for each activity, that include provision for the capture of records at particular points of the process
  • a defensible disposition approach – the definition of policies to apply to aggregations of content around the organisation.    For e-mail accounts you might have a policy that the e-mail accounts of individuals deemed to have important roles in the organisation are kept permanently, whilst the e-mail account’s of less important individuals are kept for six years.  You might set a policy that content in SharePoint sites/shared drives is deleted after five years/fifteen years. or kept permanently depending on the importance of the functions carried out by the team concerned.   There is no retention schedule, and no records classification, just risk based decisions.
  • automated filing by a rules engine  through the definition of rules to enable a rules engine to recognise which content arises from which particular activity
  • automated filing by an auto-classification tool  the  use of machine learning to learn the attributes of content that arises from particular activites,  without the organisation having to write a rule set.

(I have taken the liberty of changing NARA’s categories slightly – this summary  sticks more closely to NARA’s definitions)

 These new approaches are very different from the precious electronic records management system approach, but there has been no change in records management theory in between the two approaches (unless you count the new ideas from the information governance world – such as ‘everything held is a record’  ‘big buckets are better than granular hierarchies’ ‘end-users should not have to bear the burden of records management’)

NARA gives the pros and cons of each automated approach, without favouring any one or other of them, and without stating whether or not they believe each approach will keep records to an acceptable standard on a corporate scale.    This is not NARA’s fault – it is simply a recognition of the fact that at the moment we as a profession have no handy set of criteria to evaluate these approaches against.

In this blogpost I am going to nominate what I think is the  most useful and concise set of criteria for judging a records management system or approach- namely the five characteristics of a reliable records system that were developed in Australia in the early 1990s and ended up as section  8.2.2 of  the International Records Management Standard ISO15489.

The five characteristics of a reliable records system

Reliability is the most important characteristic of a record system.  A record system exists to perform a paradoxical function.  It exists to both:

  •  enable external stakeholders to scrutinise an organisation AND
  •  to enable the organisation to defend itself against external scrutiny

This paradox is fractal – it is present at any level of aggregation:

  • a record system enables an organisation to scrutinise a team, and it enables the team to defend itself from scrutiny
  • it enables an individual to be scrutinised and to defend him/herself from scrutiny.

The only way that a records system can resolve this paradox is by being trusted by all parties – in other words for all parties to consider the system to be reliable  – the individuals carrying out a  piece of work, their immediate colleagues and line management, the rest of the organisation, and external stakeholders.

Section 8.2.2.of the ISO 15489 standard states that in order to be considered reliable a records system must:

  • routinely and comprehensively capture all records arising from the activities that it covers
  • act as the main source of reference for the activities it covers
  • link records to the activities from which they arose
  • protect records from amendment or deletion
  • preserve access to records over time

 

Why the five characteristics of reliability are essential, rather than merely desirable

These characteristics are not ‘nice to have’ they are ‘must have’.

Think what would happen if a records management system did not meet even one of these characteristics:

  • If records were not consistently captured into the record system then external stakeholders could wonder wether there was a bias in record capture, for example whether content that could be incriminating/embarrassing was deliberately kept off the record system
  • If a record system does not comprehensively capture all records arising from an activity then this will leave gaps in the record and weaken the organisation’s ability to defend or learn from the way it carried out that activity.  It will also lead to external stakeholders looking to other sources of information outside of the records system – sources that may be more time consuming and more embarrassing for the organisation to search.
  • if a record system does not serve as the main source of reference for the activities within its scope then any gaps in the record will go unnoticed, and uncorrected.   It is particularly important that the record system is used as source of reference by colleagues carrying out the piece of work itself – because they are the only people in a position to know that there is content missing from the record.
  • if records are not linked to the activity from which they arose then the organisation will find it impossible to apply a precise retention period to those records.   Retention periods are specific to a type of activity:  managing staff,  designing a building, managing a project, bringing a pharmaceutical product to market,  adjudicating on a citizen benefit claim etc..  The trigger point for that retention period to start is even more specific, it is specific to a particular instance of each activity:   the date a particular member of staff left employment;  the date a particular building ceased to exist; the date a particular project finished, the date a particular pharmaceutical product was withdrawn from the market, the date a particular citizen ceased receiving benefit etc.
  • If records are not protected from amendment or deletion then an external stakeholder will have cause to doubt whether or not content detrimental to the organisation, or to a particular team or individual, has been deleted from the system prior to them viewing the record
  • If access to records is not preserved over time then an organisation cannot be sure that if will be able to defend itself from scrutiny/challenge if that scrutiny or challenge comes at a date in the future.  By the same token external stakeholders cannot be sure that they will be able to scrutinise the organisation at a future date.

 

Why have we lost site of these reliabilty criteria?

At first site it seems odd that we as a profession have lost site of these criteria for judging a records system.

These criteria are not obscure.    They were a fundamental part of the drive by the records management and archives professions to manage the transition from the paper age to the networked digital age by expressing the attributes of good recordkeeping systems in an abstract, non-format specific way.

Although these criteria are trying to be as timeless as possible, they are also very much of their time.  They were written at the start of the 1990s, just before the mass adoption of e-mail and networked computers with shared drives  by organisations in what were then called developed economies. They were embedded in the Australian records management standard which later became the International Records Standard (ISO 15489, published in 2001).

The reason why we have lost site of them is that we have not been able to implement systems that meet all five of these criteria,  on the scale we want to work at (the corporate scale) since the mass adoption of e-mail.  Nor is there a realistic likelihood that any of the five automated approaches discussed by NARA will meet all five of these criteria.

 

Evaluating existing record keeping systems against the five reliability criteria

When you compare existing systems within organisations to these reliability criteria you find that:

  • line of business systems (case file systems and sector specific systems such as insurance claim systems) can be set up to routinely meet all five reliability criteria.   They can use functional e-mail addresses, and/or web forms, to divert correspondence related to that function away from private e-mail accounts directly into the system, which means they can routinely and comprehensively capture correspondence arising from  the activities that they cover.   However each line of business system can only cover one area of the business.    Organisations carry out  so many different types of work that it is impossible to have a line of business system for each of them.
  • generic document management repositories such as shared drives, electronic records management systems, and SharePoint cannot routinely and comprehensively capture business correspondence sent and received through e-mail – they are dependent on individuals exercising their own judgement on which e-mails (if any) they upload or push to the system.    None of these three types of repository serve as the main source of reference to all or most of the activities that they cover.    Individuals tend to use their own e-mail accounts as the first place  to go to find records of their work.   Of those three repositories electronic records management systems and, to a lesser extent SharePoint, do a better job than  shared drives at protecting records and linking them to the business activities that they arose from.
  • e-mail archives routinely and comprehensively capture electronic correspondence.  However they do not relate records to a business activity which leads to problems with applying access and retention rules .  They cannot be used as the main source of reference for activities by anyone except Legal Counsel because private and sensitive e-mails are undifferentiated from other correspondence

 

Evaluating automated approaches against the five reliability criteria

The automated approaches described in the NARA report  either don’t meet all five reliability criteria or they don’t scale across an organisation

  • the in -place approach – currently lacks an answer to the question of how  to routinely and comprehensively capture important e-mails and relate them to business activities.  It is dependent on either individuals dragging e-mails into folders (which is not routine or comprehensive because many individuals never file e-mail into folders) or one of the automated filing approaches (see below)
  • a workflow approach – works well, meets all five criteria, but cannot scale across an organisation because of the time taken to analyse processes and define workflows
  • a defensible disposition approach – does not protect records.   A disposition rule is applied to a part of a shared drive or a SharePoint site but indviduals can delete or amend content before the time at which the disposition rule is applied.
  • automated filing by a rules engine  – the time spent to write the rules means this approach does not scale to a whole organisation
  • automated filing by an auto-classification tool -  a typical organisation carries out a great many different types of activity.  For each type of activity it carries out a great many different instances of that activity.  An auto-classification tool  has to be trained in each container/category it is asked to file records into.   The more containers/categories you have the higher the cost of training.  Organisations trying to implement auto-classification corporate wide have had to compromise and ask the tool to file into ‘big buckets’ rather than into a container for each instance of each activity (project/case/matter/ relationship etc.).  This means that on a corporate scale auto-classification does not currently meet the criteria of linking records to the business activities that they arose from.  I explained above that although retention rules may be set on broad swathes of activites (e.g records of all our projects are kept for ten years after the project ends) to apply that rule you need to have each record allocated to a particular project so that the ten years can be triggered by the end date of that particular project.   The buckets also end up being too big to be navigable or useable by end users, meaning that the system ends up not being used as the main source of reference for the activities it relates to.

 

Conclusion

At this point in time we need to be honest and say that we have no approach to implementing systems on a corporate scale that will routinely and comprehensively capture business correspondence , protect it, link it to the business activity it arose from , maintain access to it over time, and  serve as the main source of reference for the activities it covers.  Nor have we the prospect of such an approach any time soon.

Until we get such an approach organisations will suffer problems with their records management/information governance.

We have seen organisations establish electronic records management systems/SharePoint implementations and hope that access to information requests/ e-discovery can be confined to those systems, only to find themselves searching e-mail accounts, shared drives and maybe even back up tapes.  This is because their official records system does not routinely and comprehensively capturing records.  Their external stakeholders have responed in effect by treating all the organisation’s applications as being part of their records system.

We need approaches to records management that result in systems that are both reliable and relied upon.   If end users do not rely upon a records system,  but instead refer mainly to content outside the scope of the record system, then they will neglect to point out omissions in the record system, and there will be a disconnect between the records available to the individuals carrying out the work, and the records available to those wishing to scrutinise or continue their work.

Records management and/or information governance

Last week Laurence Hart published a blog post in which he stated:

”I’ve been talking a lot about information governance of late. The reason I’ve been doing it is because if it simply becomes a term used in place of Records Management we will have wasted an opportunity. Information Governance is different. It needs to be different.

Records management failed.  We need a new approach. Information Governance has the potential to be that new approach, if we tackle it correctly.   If we get lazy, we will be fighting the same battles for another decade.”

Laurence’s post should prompt us to explore the distinction between the terms ‘records management’ and ‘information governance’.  They are both names we give to approaches to a particular problem that organisations face.  The best place to start comparing them is by defining exactly what that problem is:

  • every day there is a massive flow of written communications into, out of, and around every organisation.  Thirty years ago they moved around in envelopes.  Now most of them move around as e-mails or in attachments to e-mails.   But the flow has continued uninterrupted.
  • what we call ‘records’ are simply these written communications when they are at rest.
  • the organisation is accountable for each piece of work it undertakes. It needs to find a way of assigning each significant communication to the piece(s) of work that it arose from. Thirty years ago this would have been done by placing communications onto paper files that represented each piece of work.  Now there are lots of different ways this could be done.  We could ask users to assign them to electronic files, or set up workflows to move the communications to the right file, or define routing rules, or set up e-mail accounts dedicated to particular pieces or areas of work, or train an auto-classification engine
  • if the organisation does not assign each significant communications to the piece of work that it arose from that then it will not be able to assign an appropriate access rule or retention rule to those communications.  Nor will it be able (at least with any degree of certainty) to present colleagues or stakeholders in those pieces of work with a complete collection of documentation of that piece of work
  • any method the organisation uses to manage its records needs to be built into the communications flow itself.  If it is not part of the flow it will not be able to cope with the volume of communications exchanged.  This was true in the paper world, and is even more true now
  • once assigned to a piece of work, a communication should be protected from amendment or deletion for the length of time that the organisation needs a record of that piece of work
  • not all communications are significant, many are trivial or ephemeral or irrelevant to the organisation’s responsibilities or purpose.  These communications do not need to be assigned to a piece of work, and do not need to be protected from alteration or deletion during a retention period.  They can be scheduled for deletion after a convenient interval
  • whatever method the organisation chooses to use to filter out insignificant communications needs to be trusted by both internal and external stakeholders

 

It doesn’t matter what we call the next generation of solutions to this problem:

  • we could call it ‘records management’ on the grounds that it is the same old problem that records management has been tackling for fifty years, sometimes successfully, sometimes unsuccessfully
  • we could call it ‘automated records management’ as NARA (the US National Archives) are calling their search for automated solutions to this problem across US federal government
  • we could it ‘information governance’ as it tends to be called amongst vendors, and in the private sector, if we want to put clear water between the largely discredited electronic records management system approach and this newer (but not yet fully articulated) set of approaches

Whatever name ends up sticking to the new approach, one thing we need to remember is that records management is a body of thought, professional practice and experience stretching back half a century.    There is far more to records management than simply the roll out of a particular breed of product that used to be called electronic document and records management systems.   Even as ‘information governance’ tries to distance itself from the electronic records management system approach,  it should endeavour to take on board the lessons the records management profession has learned over the past fifty years.

The corollary of that is that we records managers need to articulate what those lessons are:

  • What we can we learn from the relative success of records management approaches in dealing with the communications flow in the paper age?
  • What we can we learn from the failure of electronic document and records management systems (and SharePoint) to adequately deal with the digital communications flow?
  • What can we learn from those records management approaches that have worked in the digital age ? (because there are indeed been some pockets of success)

I will have a stab at distilling these learnings in the next few blogposts.   We need to use these learnings to come up with some critical success factors for the next generation of records management/ automated records management / information governance approaches.   And those criteria in turn can be used for us to come up with some alternative scenarios of how such an approach might work.

Auto-classification – will cloud vendors get there first?

It is easy to predict that data analytics, auto-classification and the cloud will have an increasing impact on records management.   The big question is whether these three trends will act separately or in combination.

My guess is that they will have their most powerful effect when they are used in combination – when a cloud provider uses analytics on the content they hold on behalf of a great many different customers to auto-classifiy content for each customer.

Lets look at each of these phenomena separately and then see what happens when they come into combination with each other

Data analytics

Analytics is the detection by machines of patterns in content/data/metadata in order to derive insights.  These insights may be designed for action by either people or machines.

You can run analytics across any size of data set, but it yields more insight when run across large data sets.

When people talk about big data they are really talking about analytics.   Big data is what makes analytics effective.   Analytics is what makes big data worth the cost of keeping.

The use of analytics in records management now

The use of analytics in records management is still in its infancy

There are many tools (Nuix, HP Control Point, Active Navigation etc.) that offer analytics dashboards that can display insights gleaned from crawls of  shared drives, SharePoint, e-mail servers and other organisational system.  They allow an administrator to drill into the index to find duplicates, content under legal hold,  ROT (redundant outdated and trivial documentation), etc..

Records managers and information governance managers are, to the extent that they are using analytics at all, using these tools to deal with legacy data.    Content analytics tools  are put to use to reduce a shared drive in size, or prepare a shared drive for migration, or apply a legal hold across multiple repositories etc.   All worthy and good.   But it means that:

  • we are using  analytics on content with the least potential value (old stuff on shared drives) rather than on content with the most potential value (content created or received today)
  • we are using analytics to reduce the cost of storing unwanted records but not to increase access to, and usage of, valuable content
  • we have a very weak feedback loop for the accuracy of the analytics because the decision on whether something is trivial/private/significant is being made at a point in time when it has little consequence for individuals in the organisation (and hence they will neither notice nor care if a mistake is made)]

Auto-classification

Auto-classification is a sub-set of analytics.    Auto-classification uses algorithms and/or rules to assign a digital object (document/e-mail etc.) to a classification category (or a file/folder/tag) on the basis of its content, or its metadata, or the context of its usage.

Auto-classification is becoming a standard offering in the products of e-mail archive vendors like Recommind, content analytics vendors like Nuix and HP Autonomy, enterprise content management vendors like IBM and Open Text.

There is a real opportunity for auto-classification, but to harness it we need to overcome two barriers:

  • the trust barrier – organisations are currently reluctant to use auto-classification to make decisions that would affect the access and retention on a piece of content.
  • The training barrier – it takes time to train an auto-classification engine to understand the categories you want it to classify content against

Overcoming the trust barrier

Imagine you are a middle manager.   Into your e-mail account every day come two hundred e-mails – some of them are innocuous but others have some sort of sensitivity attached to them.   You do not have a personal assistant.  Do you trust an auto-classification engine to run over your account once a day and assign categories to e-mails which will mean that some of them will become visible or discoverable by colleagues, and others will be sentenced to a relatively short retention period?

I have heard from two different vendors (HP and Nuix) that customers are still reluctant to trust algorithmic auto-classification to make decisions on content that change its access and retention rules.  They report that customers are happier to trust auto-classification where decisions are based on rules (for example ‘if the e-mail was sent from an address that ends in @companyname.com assign it to category y’)  than they are to trust decisions made on an algorithmic reading of content and metadata.   But organisations cannot possibly manually define enough rules for an auto-classification engine to make rule-based decisions on every e-mail received by every person in the organisation.

The whole point of auto-classification is to change the access and retention on content, and particularly on e-mail content.   For example the purpose of applying auto-classsification to e-mail is to:

  • prevent  important e-mails staying locked in one or two people’s in-boxes inaccessible to others in the organisation
  • prevent important e-mails getting destroyed when an organisation deletes an individual’s e-mail account six months/two years/ six years after they leave employment.

The way to increase trust in auto-classification is to increase its:

  • transparency – make visible to individuals how any e-mail/document is being categorised,  who will be able to see it, and why the engine has assigned that classification
  • choice –   give individuals a way of influencing auto-classification decisions – give them warning of the categorisation before it is actioned, and let them reverse, prevent or change the classification
  • consequence - there needs to be some impact from auto-classification decisions in terms of access and retention- -otherwise individuals will not act to prevent or correct categorisation mistakes
  • consistency – nothing breeds trust more surely than predictability and routine
  • use - make sure that the output of auto classification is groupings of e-mails/documents that are referred to/subscribed to/surfaced in search results etc.  This not only means that mistakes are spotted quicker, it also means that the organisation gets collaboration benefits from the auto-classification as well as information governance benefits.

There is a bit a chicken and egg situation here:

  • in order for the organisation to develop confidence in auto-classification it needs end users to be interacting constantly with the auto-classification results, so that there is a feedback loop and so the auto-classification engine can learn from end-users behaviour and reaction…..
  • …but in order to get that level of interaction the auto-classification needs to be dealing with the most current and hence the most risky content.

This means that in order to overcome the trust barrier we also need to overcome the training barrier,  to get the auto-classification engine accurate enough for an organisation to start using it.

Overcoming the training barrier

An auto-classification engine needs to learn the meaning of the categories/tags/folders/ that the organisation wants it to assign content to.   The standard way of doing that at the moment is to prepare a training set of documents for each category.  This is time consuming especially if your classification is very granular.

For auto-classification to be viable across large sectors of the economy it needs to work without an organisation having to find a training document set for every node/category of the classification(s) it uses.

There are two ways of replacing the need for training sets.

The first is to piggy back on the training done in other organisations in the same sector.  So if one UK local authority/US state or county/Canadian Province trains its auto-classification engine against its records classification then in theory this could be used by any other local authority/state/county/province.    This may create an incentive for sectors to arrive at common classifications to be used across the sector.

The second is to use data analytics to bring into play contextual information that is not present in the documents themselves and their metadata.     Ideally an auto-classification engine would have access to analytics information about each individual in the organisation.   It would know:

  • which team they belong to
  • what their team is responsible for
  • what activities (projects/cases/relationships) the team is working on
  • where they habitually store their documents
  • who they habitually  correspond with

It would use this information to narrow down its choice of auto-classification category for each document/e-mail created or received by each individual.

The opportunity here is to use data analytics to support and train an auto-classification engine, and hence eliminate the need for a training document set.  I see know reason why that shouldn’t work, provided that the data sets that the data analytics runs on are big enough and relevant enough.

It follows from this that the vendors whose auto-classification engines will work the best for your organisations are the vendors:

  • with access to data arising from the content of other organisations in the same sector as yours
  • with access to the broadest range of data from your organisation – including e-mail correspondence data,  social networking data, search log data, and content analytics data from document repositories such as SharePoint and shared drives

Which category of vendor will have access to all of this data?  Cloud vendors.

The cloud is a game changer

I met Cheryl McKinnon at the IRMS2014 conference last week.  She told me that there is a cloud based e-mail archive service called ZL who advise their clients not to delete even trivial e-mails on the grounds that the data analytics runs better with a complete set of e-mails.

What analytics usage could you possibly want trivial e-mails for?   The example Cheryl gave was of a company wanting to be able to predict whether new sales staff will be high performing or not.   It might run analytics on the e-mail of sales staff to surface communication patterns that correlate with high performance.  It could then run the algorithm over the  e-mail correspondence of new staff to see whether they are exhibiting such a communication pattern.  And  trivial e-mails may be just as good an indicator of such patterns as important e-mails.

Data analytics is becoming all pervasive.  Its use will affect every walk of life.  This means that more and more data will be kept to feed the analytics.   The all-pervasive nature of data analytics means that  both cloud vendors (in this case ZL) and their clients have an interest in keeping data that would otherwise have been valueless.

Cloud vendors will acquire more and more data from inside more and more organisations.   This potentially gives them the ability to train and refine content analytics across a wide spread of organisations, and provide auto-classification as part of their cloud service back to their organisational clients.

We can predict that:

  • the relationship between an organisation and its cloud vendor will be completely different than the relationship between an organisation and its on-premise vendor
  • the nature of cloud vendors will be different from the nature of on-plremise vendors – for example Microsoft the provider of Office 365 will behave in a completely different way than Microsoft the vendor of on-premise SharePoint and Exchange

Lets think about Microsoft’s strategy.  They have:

  • the leading on-premise e-mail storage software (MS Exchange)
  • the leading on-premise document storage software (MS SharePoint)
  • the leading on-premise productivity suite (MS Office).

In the on-premise world they kept  these products  separate.  In their cloud offering they have combined them into one (Office 365), and are charging less for the combined package than you might have predicted they would charge for any of the three on its own.    They have also announced plans  for integrating enterprise social into Office 365 through ‘codename Oslo’ .   This will use  analytics data on who each individual interacts with to present personalised feeds of content (Microsoft call this the ‘Office graph’ in a nod to Facebook’s ‘social graph’) .

What do Microsoft’s actions tell us?  They tell us that their business model for Office 365 is different from their business model for their on-premise software:

  • In the on-premise world Microsoft  wanted to upsell – by getting existing customers to buy more and more different software packages from them.  Each of their software products had its own distinct brand.
  • In the cloud world Microsoft wants to give customers all of their core products, so that they get the most content and hence the most analytics data from each customer.  They are even prepared to deprecate a brand name like ‘SharePoint’  in favour of a single ‘Office 365′ brand for their cloud package.

How long will it be before Microsoft uses the analytics data it will have gained from across their many customers, to start enhancing metadata, enhancing search, and auto-classifying content for each customer?

The questions this poses for NARA

The US National Archives (NARA)  has recently put out for consultation their automated electronic records management report .  The report is part of the mandate given by them by the Presidential Records Management Directive  to find ways to help the US federal government automate records management.

NARA’s report gives a good description of autocategorisation, although it is based on the assumption that the autocategorisation engine needs training packages in order to work.   It acknowledges that:

‘the required investment [in autocategorisation] may not be within reach of the smallest agencies, though hosted or subscription services may bring them within reach for many’ (page 13)

NARA is acknowledging here that cloud vendors are more likely to bring auto-classification to many agencies than they are to develop the capability themselves.   This poses some very fundamental questions:

  • Would the federal government be happy to let a cloud vendor such as Microsoft  use data analytics to auto-classify  federal e-mails and documents? OR
  • Would they rather each individual federal agency develops their own capability ? OR
  • Do they think federal agencies need to club together to create a pan-government capability?

The security and information governance  issues this question raises are massive.

  • From a security and an information governance point of view the option of each agency having an individual analytics capability is clearly the best, because the cloud option and the pan-administration option create too large a concentration of data and insight about the US federal administration.
  • But from a big data/data analytics point of view the  pan-administration option or the cloud option is better, because they give a bigger base of data on which to make better auto-classification decisions.

 

The Ontario gas plant records deletion saga – a records management case study

 

The records deletion controversy in Ontario is of relevance to archivists and records managers elsewhere in the world because of the stark contrast it poses between on the one hand a very strong and complete records management governance framework:

  •  Ontario has a relatively recent piece of Archives legislation (the Archives and Recordkeeping Act 2006)
  •  Ontario has a comprehensive set of records retention schedules, all signed by the Archivist of Ontario and backed up by the Archives and Recordkeeping Act
  • one of those retention rules states that ministerial correspondence (correspondence arising from the portfolio responsibilities of a minster) should be preserved permanently

…and on the other hand:

  • the lack of any  planning as to how this retention rule on ministerial correspondence could be applied in a situation where the correspondence accumulated in the individual e-mail accounts of political staff working in ministerial offices
  • the ability of political staff to delete e-mails (whether trivial or important) from their e-mail accounts should they wish to do so
  • the operation by the Ontarian government of a routine policy of deleting entire e-mail accounts when staff leave

This tension between recordkeeping policy and e-mail practice is not unique to the government of Ontario, it is a universal problem, facing all administrations.

The US National Archives (NARA) took the step in August 2013 of issuing advice to US federal agencies that the e-mail accounts of important officials should be preserved permanently if the agency cannot find any other reliable way of capturing the significant correspondence of those individuals.   This advice is contained in bulletin 2013 -02, and has gone under the name of the ‘Capstone’ approach.

It will be interesting to see whether or not other National Archives around the world follow NARA’s lead and  intervene in the way that the e-mail accounts of important officials are managed.

The slidepack embedded in this post is a collection of all the episodes of the Ontario gas plant records deletion saga comic strip that I have published on this blog (together with a few extra slides that I have added in the middle and at the end).

The slidepack goes under a creative commons licence, so feel free to use it for non-commercial purposes.  My intention is that it serves as a case study. Accompanying the slidepack are:

  • a records guru podcast that I recorded with Jon Garde in which we discuss the saga
  • a blogpost in which I give a recordkeeping perspective on the saga

 

The Ontario gas plant cancellation records deletion saga from a recordkeeping perspective

1.     Introduction

The Ontario gas plant cancellation records deletion saga has occupied a considerable amount of column inches and radio and TV time in the province itself, in Canada, and beyond since the spring of 2013.   Little or none of this debate has been informed by a recordkeeping perspective, despite the fact that the deletion controversy started with an allegation that Ontario’s Archives and Recordkeeping Act had been breached by Craig MacLennan, the former Chief of Staff to the Minister of Energy.

This post attempts to provide a record keeping perspective on the saga.

It argues that:

  • the deletions that have caused most of the controversy in the saga are less damaging in recordkeeping terms than a form of deletion that has so far caused little or no controversy – the IT policy of deleting staff email accounts when a member of staff leaves employment.
  • most of the debate in Ontario’s Parliament has concerned the behaviours and motivations of individual political staff who work or who did work in the Office of the Minister of Energy and the Office of the Premier.     Of much more interest from a recordkeeping perspective is the question of why the record keeping systems in place were not robust enough to capture and protect an adequate record of the correspondence of the ministers involved in the gas plant cancellation decisions.

The post goes on to give recommendations as to how the Ontario Government (or any government) could prevent the recurrence  of a similar saga.

To accompany this post I have recorded a records guru podcast in which I discuss the saga with Jon Garde (listen to it here)

2.     The different types of deletion involved in the saga

The saga has involved three different types of deletion:

  • the confession by  Craig MacLennan in April 2013 that he routinely and indiscriminately deleted e-mails from his e-mail account whilst serving as chief of staff to the Ministry of Energy
  • the government’s  IT policy of deleting e-mail accounts of staff members when they leave Ontario’s public service
  • the alleged attempts by staff working in the Office of the Premier, to wipe clean the hard drives of the computers used by colleagues who were leaving their posts during the transition from Premier McGuinty to Premier Wynne  in the autumn of 2012

Which one of these types of deletion is the most important depends upon whether you are looking at this from a political perspective, a police perspective, an access to information perspective, or a recordkeeping perspective.

3.     The routine deletion of e-mails from his  e-mail account by Craig MacLennan

3.1        Craig MacLennan’s confession

In April 2013 Craig MacLennan, former chief of staff to the Minister of Energy, was asked why he had not provided any correspondence to the Estimates Committee of Ontario’s Parliament in response to their request in the spring of 2012 to see correspondence related to a decision to cancel and relocate two gas plants.   He replied that at the time of the request he did not have any such correspondence.   The reason for this was that he kept a clean in-box and routinely deleted e-mails from his in-box and sent items as he went along, in order to keep within limits he thought had been set by IT.

3.2     Did the deletion actually happen?

The first thing to be said about Craig MacLennan’s confession of e-mail deletion is that we do not know whether or not such deletion actually happened!

The policy of the government of Ontario is to delete e-mail accounts when a member of staff leaves.  Craig MacLennan left the public service in June 2012.

The e-mail accounts of most of Ontario’s public servants are split over two tiers of storage.   E-mails less than 30 days old are kept on first tier storage  in Microsoft Exchange.  When an e-mail is thirty days old it moves to the portion of the e-mail account kept on cheaper, slower hardware, within an e-mail archive (Enterprise Vault by Symantec).

The Ministry of Government Services (MGS) stated that the e-mail archive existed purely to save storage costs, not to protect or preserve e-mail.   Whether a particular e-mail was stored in Exchange or in Enterprise Vault should not have made any difference to how long the e-mail was kept or how safe it was from deletion.   An individual member of staff could delete e-mail from either portion of their e-mail account.

In the summer of 2013 MGS forensic staff discovered that for a period of time IT staff had, by an administrative oversight, omitted to delete the Enterprise Vault portion of the e-mail accounts of staff who had left.  During the period when the policy was not applied 30,000 people had left,  so the Ministry was left with 30,000 orphaned accounts (orphaned because the only way of navigating to these accounts was through Microsoft Exchange, but the Exchange portion of the accounts had been deleted, leaving the Enterprise Vault portions of the e-mail accounts invisible and inaccessible).

The forensic staff checked through these 30,000 orphaned accounts and found that one of them belonged to Craig MacLennan.   When they opened MacLennan’s account they discovered it contained 38,000 e-mails including 1,900 relating to the gas plant controversy.

If MacLennan routinely deleted his e-mails as he went along, as he claimed; and if it was possible for an individual to use their Outlook e-mail client to delete an e-mail stored on second tier Enterprise Vault storage; then why were there there 38,000 e-mails left in the Enterprise Vault portion of MacLennan’s account after he left?

There are two alternative possible explanations for this,  either Craig MacLennan routinely deleted e-mails from his Outlook client, but did so in a way that did not cause those e-mails to be deleted from the Enterprise Vault e-mail archive.   Or  Craig MacLennan did not routinely delete e-mails at all.

3.3 Possible explanation 1:  Craig MacLennan deleted the e-mails from his Outlook e-mail client but did so in a way that did not cause them to be deleted from the Enterprise Vault archive

 In July 2012, a month after MacLennan left employment,  the Ontario government upgraded their e-mail server from Microsoft Excange 2003 to Exchange 2010.

  • Prior to the upgrade (and therefore during MacLennan’s spell of employment) it was possible for an individual to use their Outlook e-mail client to delete an e-mail over thirty days old from the Enterprise Vault  archive – but to do so they would have to use the Enterprise Vault tool bar that exists as a plug-in within their Outlook client.
  • Since the upgrade to Exchange 2010 (but after MacLennan left employment) it has been easier for an individual to use their Outlook e-mail client to delete e-mails from the Enterprise Vault archive.  There is still the option of using the Enterprise Vault toolbar within Outlook to delete.  But there is also now the option of simply selecting the e-mail within Outlook and pressing the delete key.

(This information is gleaned from the information given in the table, and in the footnote to the table,  provided on page 11 of the Information and Privacy Commissioner’s ADDENDUM to Deleting Accountability: Records Management Practices of Political Staff A Special Investigation Report)

3.4    Possible explanation 2:  Craig MacLennan did not routinely delete e-mails at all

It is possible that MacLennan might have preferred to be thought guilty of an inadvertent, non-malicious  breach of the Archives and Recordkeeping Act (which carries no penalties) than to be thought guilty of not producing records in response to a request from a Parliamentary Committee.    If MacLennan knew that the Minisitry of Government Services deleted e-mail accounts when staff leave, then he may have supposed that there would be no way of contradicting his claim to have routinely deleted his e-mail.

4.     The deletion of e-mail accounts when staff leave

4.1   The deletion of e-mail accounts when staff leave makes the question of whether or not MacLennan deleted e-mails from his account academic

Craig MacLennan’s reported deletions were:

  • of interest to opposition politicians hoping to prove that there was a conspiracy of political staff serving the Liberal administration to delete records of the gas plant cancellations AND
  • of interest to the Information and Privacy Commissioner from an access to information perspective because the routine deletion was the reason given by MacLennan for his non production of gas plant cancellation correspondence to Parliament.

But from a recordkeeping point of view the question of whether or not MacLennan deleted e-mails from his account  is academic. Even if MacLennan had kept every single one of his e-mails it would not have helped the Archivst of Ontario because the policy of Ontario’s  Ministry of Government Services was to delete staff e-mail accounts when the staff member leaves.

The discovery by forensic staff of the lion’s share of MacLennan’s account in an orphaned Enterprise Vault account does not mean that this correspondence was safe for the duration of the relevant retention rule.   The Ministry of Government Services had decided prior to the investigation to delete the orphaned accounts, in accordance with their policy, once an upgrade to the Enterprise Vault software had taken place.   They have since placed a stay of execution against e-mail accounts relevant to the gas plant saga.

4.2      The contradiction between Ontario’s retention rule on ministerial correspondence and its disposition policy on staff e-mail accounts

The retention schedule for ministerial public records signed by the Archivist of  Ontario states that ministerial correspondence should be transferred to the Archives of Ontario  for permanent preservation after five years (or upon a change of administration).

MacLennan stated that he never filed any e-mails anywhere.     This means that the only place ministerial correspondence of the Minister of Energy would be captured would be in the e-mail accounts of MacLennan and his colleagues (apart from copies scattered amongst the e-mail accounts of senders/recipients).    This in turn means that the e-mail accounts of Craig MacLennan and his colleagues should be of interest to the Archivist of Ontario.

So what policy should Ontario apply to the e-mail accounts of political staff:

  • the retention rule applying to ministerial correspondence that must make up a significant proportion (though not the entirety) of those e-mail accounts?  OR
  • the IT disposition policy of the Ministry of Government Services that e-mail accounts should be deleted when a member of staff leaves?

The only way those two policies could logically co-exist together would be if some sort of filing of e-mails, whether paper or electronic, was taking place.  No such practice existed in the Office of the Minister of Energy.

4.3   The impact of a blanket policy of deleting e-mail accounts when staff leave

From a recordkeeping perspective the policy of deleting e-mail accounts of all staff when they leave employment, however significant their role in public life,  is the most damaging of the three types of deletion described in this post.   It is this policy that most undermines the accountability of political staff and Ministers for their actions, and most undermines the effort to retain a record of the activities of Ministers and their staff over time.

This deletion of e-mail accounts when staff leave was not questioned by the Information and Privacy Commissioner in her report Deleting Accountability.  Nor has it been condemned by the Parliamentary Committees.  The politicians in the Committees have not  criticised this policy because it is a non-partisan policy – the Ministry of Government Services would have applied this policy regardless of the political complexion of the administration.

Political staff work in a high pace, dynamic environment.  Turnover of staff is relatively high.  They neither expect nor receive security of tenure. They are well placed to secure external jobs because of the value of their connections inside Government.   If a member of political staff thinks that the correspondence in their e-mail account would be damaging to themselves and/or the Minister then they can have that record expunged simply by leaving their employment.

5.     Attempts by political staff in the office of the premier to wipe clean hard drives    

The third deletion in the scandal came to light whilst the Information and Privacy Commissioner was conducting her investigation into Craig MacLennan’s reported routine deletion of his e-mail.  The Cabinet Secretary  told her that he had been approached by David Livingston, chief of staff to the  Premier of Ontario, at the time of the change of Premier from Dale McGuinty to Katharine Wynne.   Livingston wanted to know how to get administrator passwords to wipe the hard drives of the computers of departing political staff.

The cabinet secretary referred him onto the Chief Information Officer who was not too concerned about the proposed deletion, on the grounds that it was good practice to wipe clean devices when they were handed on, provided that the Office had complied with its obligations under the Archives and Recordkeeping Act.  He pointed out to Livingston that the Office already possessed the necessary adminstrative passwords.

Early in 2014  Ontario’s Police obtained a search warrant for the off-site storage vendor where the computers in question were being stored.  It is alleged that David Livingston, shortly after his approach to the Cabinet Secretary  had given the administrator passwords to the boyfriend of a member of the political staff, and asked the boyfriend to wipe the hard drives.   The boyfriend was not an Ontarian public servant.

At the time of writing no charges have been laid, no allegations have been proven in court and Livingston’s lawyer has denied any wrongdoing on his client’s behalf.

This deletion is important from a political perspective because it can be interpreted as showing that political staff were prepared to go to considerable lengths to delete records.   It is important to the police because criminal charges may be pressed.  But it is of little or no interest to an archivist or a records manager.     It is hard to believe that political staff were routinely using the hard drive of their computers for record storage.  It is too vulnerable to device failure, and does not generally support any form of mobile or remote access.    One presumes that the wiping of the drives was simply an attempt to defeat any forensic searches that might be made.

6 What should the Government of Ontario do to stop this type of saga recurring?

Most of the public debate on this saga has concentrated on the motivations and behaviour of individual political staff such as Craig MacLennan and David Livingston.  However from a recordkeeping perspective the question of whether or not the actions of these individuals were appropriate is of little importance..

Administrations will come and go, individual political staff will come and go.  Given the confrontational nature of the environment in which they work, we can assume that from time to time it will be in the interests of a member of a political staff to remove correspondence from the record.   Some people will succumb to that temptation, others will not.

From a recordkeeping point of view the most important question is this:

  • how can we best set up systems to routinely capture records of ministerial correspondence in ways that make it difficult for a member of political staff to remove correspondence from the record, or to prevent correspondence being captured onto the record in the first place?

Option 1 – make it as easy as possible for political staff to electronically file e-mails outside of their e-mail account 

One option would be to set up some sort of electronic document management system with e-mail integration so that political staff could simply drag and drop e-mail into folders within their Outlook client.   The folders could either be:

  • big buckets such as ‘ministerial correspondence’, ‘political correspondence’, ‘private and personal’, ‘trivia and ephemera.
  • or a more granular filing structure covering the themes and matters that the staff are dealing with

Such a solution would be an improvement on what they have at the moment (where it appears there is no simple means for staff to file e-mail).   It would have the advantage that once an e-mail had been dragged to a folder linked to a document management system it could be protected from subsequent deletion.

The weakness of  such a solution is that it is too dependent on the motivation and workload of the political staff themselves.  It leaves it down to political staff to decide what goes onto the record and what stays off the record.  This may be acceptable in a high trust environment.   However political staff operate in a low trust environment.   They are accountable to opposition politicians in the Parliament who do not and will never trust them.    Leaving political staff to decide what does and does not go onto the record is problematic, and I would not recommend this option.

The Ontarian government should aim to institute routines for capturing ministerial correspondence that will be trusted by opposition politicians even where they have no trust whatsoever in the individual politcal staff concerned.

Option 2 – automatically archive and preserve all e-mails sent and received by important political staff

The second option is to change the settings on the Enterprise Vault e-mail archive so that for designated members of political staff a copy of all e-mails sent and received is captured in the archive and protected from deletion.  This would mean disabling the ability of such individuals to use their Outlook client to delete an e-mail in the Enterprise Vault archive.

The precedent for this option is the rulings of the US Securities and Exchange Commission that all electronic communications of all broker-traders must be archived and protected.    Barclay T.Blair said in this post that these rulings (SEC 17 a-3 and SEC 17 a-4) ‘single-handedly created the e-mail archiving industry’.

There are strong parallels between the situations of  political staff and of broker-traders.  Both sets of people work in low trust, high scrutiny environments where there might be a powerful incentive to delete a communication from the record, or ensure that a communication did not go on the record.

However there is a key difference.  Unlike the correspondence of traders, the ministerial correspondence of political staff is needed for long term preservation in an historical archive.  But Ontario’s archivist has no legal right to archive the political correspondence of ministers, only their ministerial correspondence:

  • political correspondence is defined as correspondence arising from the political career of the Minister (for example his or her relations with their constituents, with their political party, and their election campaigning).   Ministers are free to dispose of such correspondence as they see fit.
  • ministerial correspondence comprises all correspondence arising from the Minister’s portfolio responsibilities and their role as a member of the cabinet

If Ontario archived and protected every e-mail sent or received by important political staff then they would have to find a way to sift out the political correspondence at the point in time at which the records are due to be transferred to the Archives of Ontario.

The Ontario government should institute a routine, trusted and simple method for political staff to separate out political correspondence from ministerial correspondence.

Recommended option – protect and preserve the e-mail accounts of important political staff –   but give them a means of flagging up  political correspondence and private correspondence

To stop this type of saga recurring I would recommend that the Government of Ontario take the following measures:

  • Designate the roles of certain political staff as being of high importance.  Ensure that all e-mails sent or received by such individuals are captured into an e-mail archive.  Protect the e-mails from deletion or amendment .  Disable the ability of individual staff to use their e-mail client to delete e-mails from the archive, or to amend those e-mails.
  • Find a means by which individuals whose accounts have been designated as of being of high importance can flag certain e-mails as private or personal.   Institute a random system of auditing to ensure that individuals are applying such a flag properly
  • Find a means by which political staff can flag certain e-mails as being political correspondence rather than ministerial correspondence. Institute a random system of auditing to ensure that individuals are applying such a flag properly
  • Retain e-mail accounts designated as being of high importance permanently.   Remove correspondence flagged as personal, and correspondence flagged as being political rather than ministerial, at the point in time at which the account is transferred to the Archives of Ontario.

The core components of the new generation of records management/information governance tools

In my last post I drew a distinction between two generations of records management tools:

  • The first generation of tools are those that hit the market between 1997 and 2009 and we called them electronic document and records management (EDRM) systems
  • The second generation are those that hit the market after 2009 and we seem to be calling them information governance tools

In this post I will look again at this distinction – this time comparing the components and capabilities of the old EDRM systems with the components and capabilities of the newer information governance tools.

The core components of the first generation of records management tools (EDRM systems)

The first generation of tools consisted of six core components/capabilities:

  • an end- user interface  to allow end-users to directly upload documents to the system
  • an integration with the e-mail client (usually Outlook) to allow end-users to drag and drop e-mails into folders within the system
  • document management features:  such as version control, check-in and check out, generic workflows and configurable workflow capabilities
  • a repository:  to store any type of documentation that the organisation might produce
  • classification and retention rules:  capability to hold, link together and apply a records classification (business classification scheme) and a set of retention rules
  • records protection – capability to protect records from amendment and deletion and maintain an audit trail of events in the life of that record

When implementing  such EDRM systems the records managers drew a ‘line in the sand’.  They aimed to implement  a system that would manage records going forward in time.  They did not attempt to deal with legacy content that had already accumulated on shared drives and in email.

The weakness of EDRM systems was that end users did not move all or most significant content into the records system.  Shared drives and e-mails continued to grow and continued to contain important content not captured into the records system.

Added to this a range of disruptions happened:

  • Microsoft’s entry into the content management space with SharePoint 2007 took away the collaboration space from the EDRM systems.   Unless they had complex requirements, organisations with SharePoint no longer needed the version control, check-in check out or workflow capabilities of the EDRM tools.
  • E- discovery/freedom of information/subject access enquiries caused more and more pain to organisations, and tended to focus on material in e-mail and shared drives rather than content in the EDRM
  • The move to smart phones and tablets made the user-interface problematic – smartphones have screens that are too small for the full functionality of an EDRM end-user interface.
  • The move to the cloud made e-mail integration problematic – cloud e-mail services do not allow customisation of their user-interface.

The seven core components of the new generation of records management/information governance tools

The second generation of records management tools, which we are calling information governance tools, consists of seven key capabilities:

  • Indexing engine  the ability to crawl and index content in many different applications and repositories (shared drives, SharePoint, e-mail servers, line of business systems etc)
  • Connectors  a set of connectors to the most common applications and repositories in use in organisations today (SharePoint, Exchange, ECM/EDRM systems etc).   The connectors enable the records system to take action on content in a target repository – for example to delete, move or place a legal hold on it.  They also enable the crawler to extract context to index.
  • Metadata enhancement and auto-classification the ability to add, through the connectors, extra metadata fields to content, and the ability to assign content to a classification either by setting rules based on parameters, or by using auto classification algorithms
  • Analytics dashboard to surface patterns in content repositories, for example to identify duplication, redundancy, trivia and high risk content
  • Classification and retention capability to hold and apply a records classification and a set of retention rules   – this is the main point of continuity between the first and second generation of records management tools.
  • In-place records management  the capability to protect records from amendment and deletion, maintain an audit trail of events in the life of that record, and assign a retention and classification rule to the record, even where the record is held in a different application than the records system itself.  From the end-user point of view this has the advantage that they can stay in the applications they are used to work in – they do not have to learn how to use the records system.
  • Repository  a repository to store any type of documentation that the organisation might produce .   The in-place records management features reduce,  but do not eliminate the need for a records repository.  Records repositories are necessary when an organisation wants to decommission an application, but still wants to retain the content from that application.  In cloud scenarios the repository comes in useful when the organisation wants the content to be available via a cloud application but not stored by the cloud provider

Notice what has been taken away and what has been added:

  • The components that an end-user interacted with – the end-user interface and the document management functionality, have either disappeared entirely or become an optional extra.
  • What comes in their place is the connectors,  indexing engine,  analytics and in-place records management capability necessary in order for a central administrator to understand and act on content held outside of the records system itself

 

The importance of the analytics dashboard

The key difference between the new generation of information governance tools and the old generation of EDRM systems is that the information governance tools pay as much (often more) attention to existing content as they do to shaping the way future content will accumulate.

The most stark illustration of the change is this:

  • ten years ago if you saw a system demonstration by a vendor at a records management event they would start by showing you their end-user interface for an individual to upload a document.
  • In 2014 a vendor will start by showing you their analytics dashboard

The analytics dashboard is the key to the new generation of  records management/information governance tools

Without the dashboard having an indexing engine crawling across shared drives, e-mail and SharePoint would be useless to the records manager.

The dashboard enables the records manager to actively interrogate the index to hone in on targets for action – information that should be deleted/moved/protected/classified/assigned to a retention rule etc.

392-analytics 1

A typical dashboard shows the records manager  how much content is held. where it is held, what file types there are, what departments it belong to,  what is redundant/outdated/trivial etc.   The dashboard also enables the records manager to use these different dimensions in connection with each other – for example to hone in on content of a particular department in a particular time period.

These are powerful tools in the hands of a central administrator, and it is important that they have workflows and audit trails in them so that:

  • the records manager can get the approval of content owners before making disposal decisions on content
  • the system can record that approval, and record the actioning of the decision

Note however that these tools are more effective at helping records managers make decisions on content that has build up in the shared drive and SharePoint environment than they are at dealing with content that has built up in e-mail accounts.

One of the challenges with EDRM systems was that it was very hard to measure benefit and give a tangible ROI.   The  business case for the new infromation governance tools often arises from savings produced by dealing with legacy data – something that the EDRM systems were not set up to do.  The ROI might come from:

  • savings from storage optimisation (moving less active content to second or third tier storage)
  • savings from reduction of content that has to be reviewed for eDiscovery/access to information requests

The benefits might be

  • capability to move content from legacy applications
  • capability to process the shared drives of functions acquired or divested in mergers and acquisitions

At the ARMA Europe conference last month Richard Hale from Active Navigation and Lee Meyrick from Nuix both gave presentations urging records professionals to be pragmatic and concentrate on targeting particular improvements one at a time.  The dashboard suits that approach – gone is the utopian wish to create a perfect records system, instead we have an incremental approach whereby a central administrators hones in on particular areas of content for protection/enhancement/migration.