How long should an e-mail account be kept after a member of staff leaves?

On 30 May 2013 two postings appeared that between them shed light on how organisations are currently managing the archived e-mail accounts of staff who have left:

    • The first was a post by Rebecca Florence to the IRMS Records-Management-UK listserv that kicked off a debate on e-mail account retention and deletion
    • The second was a blogpost by Emma Harris of State Records New South Wales reporting the findings of a survey they had conducted into how public offices in NSW are managing their e-mail

Rebecca Florence posted a description of the situation in her organisation:

The current arrangement is that for a period of time post-leaving, access to the mailbox and email archive (in our case we use the Symantec Enterprise Vault) can be passed to a designated member of staff.

After that period of time has elapsed the mailbox/archive is deleted by IT, with the contents being exported to a separate restricted access area. Access is granted to the exported contents on a case by case basis. Currently the exported content is held indefinitely.

I should add that as you would imagine there are policies and guidance in place which advises staff to save emails where necessary outside Outlook for longer term retention and also assigning responsibility post-leaving allows for a review of any remaining emails for ongoing business use. I’m sure as most of you will have experienced, there is disparity across departments in regards to how well this is managed.

Phil Bradshaw replied that keeping records indefinitely is not the same as keeping records permanently:

  • keeping records permanently means we have assessed the records and found them to have enduring long term value
  • keeping records indefinitely means we cannot find a basis to set a retention rule on them

Is it possible to deal with e-mail by reviewing e-mail accounts when members of staff leave?

Lawrence Serewicz responded to Rebecca’s post by pointing out the legal costs and risks of maintaining all e-mail accounts indefinitely:

  • e-mail accounts generally contain personal data and the indefinite retention of entire e-mail accounts may  breach several of the EU data protection principles.
  • information held in an e-mail archive may be subject to discovery in the event of a legal case, and to disclosure in the event of an access to information request

Lawrence recommended that e-mail accounts get deleted three months after a member of staff leaves, but only after:

  • a pre-exit process in which the line manager and the employee go through the e-mail account together and decide how to deal with the mails OR
  • a post exit process (in cases where the pre-exit process was not carried out )- where the specific service the employee worked for, Legal, HR and internal audit would all review the account.  The specific service would look for e-mails the service needed to carry on with the employees work; Legal would look for e-mails needed for possible legal claims, contracts or agreements; HR would look for e-mails needed for possible grievance or disciplinary issues; Internal audit would look for any illegality

The approaches described by Rebecca and Lawrence are similar in two respects:

  • both approaches reflect a belief that colleagues can not be relied upon to comprehensively and routinely deal with individual e-mails as they go along by filing and deleting
  • both approaches  rely on a big effort just before or after  the member of staff leaves to deal with what is left in the e-mail account.  This is problematic.   All of our experience as records managers tells us that it is very hard to deal with backlogs.   E-mail communications are exchanged with such frequency that backlogs quickly scale up to a size that makes patient sifting and sorting impossible.  An e-mail account at the end of a person’s employment is in effect a filing backlog.

The only difference between the two approaches is that:

  • Rebecca’s organisation cannot guarantee that  the line manager /designated person of the departed staff member will review the e-mail content thoroughly, and move important mails to a more appropriate, more accessible place.  As a result they keep all the e-mail accounts as a back up, just in case there is an overriding need (legal or investigative) to find an e-mail from an ex member of staff.
  • Lawrence’s approach requires organisations to ‘feel the fear and do it anyway’.   There is still no guarantee that reviews have been carried out/carried out properly,  but this time the organisation presses the delete button after three months regardless.

Is it possible to deal with e-mail by asking staff to move important e-mails into an electronic or paper file as they go along?

Simon McCauley responded to Rebecca’s posting by saying that in his organisation  staff are expected to save important e-mails into the electronic document and records management system (Livelink) as they go along.

Simon’s organisation are planning to implement a policy of moving e-mails from people’s e-mail accounts to an e-mail archive six months after the date of the e-mail, then deleting them from the archive after a further twelve months.

I assume that the thinking behind such a policy is that:

  • they have confidence in the capacity of their colleagues to file important e-mails as they go along
  • they know that colleagues are much less likely to file as they go along if they  have the comfort of knowing that the e-mails are kept for them in their e-mail account anyway

The  State Records Authority of New South Wales (NSW) has given similar advice to NSW public offices.   They summarise their policy as follows:

State Records advises NSW public offices to capture email messages that are sent or received in the course of official business into a corporate recordkeeping system. State Records suggests two principle methods for capturing messages:

– capturing messages into an EDRMS (electronic document and records management system)

– printing messages and capturing them on paper files

In her blogpost reporting the findings of their  recent survey of  e-mail management in NSW public offices,   Emma Harris of State records reported that:

– 81% of public offices agreed with the statement that in their offices ‘e-mail messages with corporate value are stored only in personal email accounts and are therefore at risk of loss or premature destruction’

– 33% of respondents advised that employees in their organisation neither capture messages to an EDRMS nor print and file them.

– few organisations have investigated alternative approaches to managing e-mails’[as opposed to asking colleagues to move e-mails into EDRMS/print to file].

The blogpost went on to report:

– half of the responding organisations have implemented an archiving solution, with two products (Symantec Enterprise Vault and Quest Archives Manager) being the most commonly implemented.

– A number of email archiving solutions have retention and disposal functionality (e.g. the ability to set retention periods and disposal actions on messages and to destroy messages when retention periods have expired). However the results of the survey suggest that organisations with email archiving solutions are not actively managing the retention and disposal of messages using this functionality.

The findings betray a lack of confidence on the part of the NSW public offices in the adherence of their staff to the policy of moving e-mails to electronic or paper files. This lack of confidence is presumably what lays behind the fact that NSW are, like Rebecca’s organisation, keeping e-mail accounts indefinitely.

Can we still set a blanket retention rule on e-mail accounts if we know they contain important messages that we need as records?

There is a similarity between all four approaches – Lawrence’s, Rebecca’s, Simon’s and the New South Wales approach.  All four are based on moving e-mails out of e-mail accounts.

If, like Lawrence and Simon, we are confident that we can move important e-mails out of e-mail accounts, then setting a blanket retention period on those accounts not a problem.  We set a blanket retention period covering all accounts, and we make it as short as we possibly can to concentrate peoples minds

But what if, like Rebecca’s organisation, like New South Wales public offices, and like most of the organisations I have worked with and spoken to over the last decade, you are not confident that important e-mails are being moved out of e-mail accounts?   Then setting a retention period is a different type of exercise.  All of a sudden we are having to recognise that the e-mail account is a record – a record of the work correspondence of that member of staff.

A blanket retention period, however short or however long, is not appropriate for organisations whose e-mail accounts contain important correspondence that is not available elsewhere.   This is because the roles people play in organisations vary greatly in their significance and impact – you are unlikely to need a record of the correspondence of an accounts clerk in your finance department for the same length of time as the correspondence of your chief executive (with all due respect to both parties).

We need to find a rationale on which to base a retention rule on e-mail accounts.   This is something we as a profession have not hitherto thought through for the simple reason that we have been battling for over a decade to avoid having to treat e-mail accounts as records.  Even starting to think through the consequences of treating e-mail accounts as records feels like an admission of defeat.  In reality this is not an admission of defeat.  Defeat would come up if we gave up trying to keep manageable records of people’s work correspondence.

Getting people to move individual e-mails one-by-one to electronic files is a tactic not an end in itself.   Most organisations have not been able to make that tactic work – at the very least we need an alternative.

Establishing a defensible rationale for retention rules on e-mail accounts that we treat as records

We can set a retention period for a record of a particular type of work by considering all the different reasons why we need a record of the work in question, and then keeping  the record for the longest period that any of those needs is likely to stay valid.

The  e-mail account of an ex member of staff is simply a record of the correspondence exchanged by a particular individual in the course of their work, minus any e-mails that have been deleted/moved.

There are multiple legitimate reasons why someone might need to look at the work correspondence of a colleague or  predecessor who has left :

  • They might need to see what correspondence their colleague/predecessor had exchanged with a particular external stakeholder/partner/customer/supplier/citizen in order to inform their continuation of that relationship.
  • They might need to see what correspondence the colleague/predecessor had exchanged in the course of a piece of work because they need to continue with the piece of work. restart it,  learn from it, evaluate it, copy from it etc.
  • They might need to account for their colleague/predecessor’s work, in response to audit, investigation, criticism, access to information request or legal discovery
  • Depending on the nature of the role of that individual, they might need to transfer the correspondence to a historical archive on account of the enduring public interest in the work of that individual

In most parts of most organisations we cannot adequately meet those record keeping needs without retaining the e-mail account of the member of staff concerned.   The challenge of setting a retention value on e-mail accounts is that such accounts will typically contain corresondence arising from many different pieces of work, and  those pieces of work may have very different retention values.

A nice, neat approach is simply to keep the e-mails of an individual for as long as you keep the records of the main type of work that they carried out.

  • If they were an accounts clerk in a finance department, and your organisation’s retention rule on accounting work is to delete the records after seven years, then apply that rule to their e-mail account also
  • If they were a senior civil servant working on policy issues and on new legislation,  and your retention rule for work on the development of legislation, and on the development of national policy, states that records should be kept for  for 20 years and then reviewed for permanent preservation and transfer to a historical archives,  then apply that rule to  their  e-mail account also
  • If they worked on staff recruitment, and the retention rules for recruitment work is to delete records three years after the recruitment exercise,  then retain their e-mails for three years too.

One choice to make is whether to have the retention rule:

  • applied to the entire e-mail account – so the retention rule is triggered from the moment of the individual’s departure from the organisation (this has the disadvantage that some staff may have had long and varied careers in the organisation)
  • applied to e-mails by date (month or year)  –  so the retention rule is triggered by the end of the month or year that the e-mail was sent/received in (a better option)

The problem of personal data of a sensitive nature in e-mail accounts

So far so good – we have a defensible logic to base our  retention rules on e-mail accounts, to meet the full range of records management needs.  But there is a problem.  The problem is the widespread presence of personal data of a sensitive nature in e-mail accounts.  By ‘sensitive nature‘   I mean

  • information about the e-mail account holder that they would not want even their closest colleagues or their successor to access; and
  • information about a third party that the e-mail account holder corresponded with, or had discussed in e-mails, where that person could be disadvantaged if the information were to be made available even just to the account holder’s successor and closest colleagues

Even if an individual never used their work e-mail account for non-work correspondence with friends and family, their account is still likely to contain personal information of a sensitive nature, exchanged with colleagues.  Think of an e-mail exchange between a line manager and a member of their team who had to take time of work for personal or family reasons.

The fact that most e-mail accounts have not had such e-mails filtered out means that most organisations in my experience (centred around the UK and Europe) cannot currently allow colleagues routine access to the e-mail accounts of their predecessor, or their former colleagues.

Most organisations struggle to set access rules on e-mail accounts

Most electronic document management systems work on the principle that access permissions can be set for objects or aggregations of objects (file/folder/site/library/document etc.).   A person or group of people is either permitted or forbidden to access that object/aggregation.   There are no grey areas in between.  If I  am authorised to see a document then the system merely asks me to authenticate myself (so the system knows it is indeed me who is asking) .   It does not ask me why I want to see it.

Rebecca’s organisation allows access to archived e-mail on ‘a case-by case’ basis.  In other words they are unable to tell their e-mail archiving tool who is authorised  to access each e-mail account.

With e-mail archives the information contained in the archive is so sensitive that organisations are imposing an extra control – people are having to say why they need to access the e-mail account, and that request is either permitted or denied, not by the e-mail archive itself, but by people in the department responsible for overseeing the archive.

I worked with one organisation where any application to see e-mail accounts of former staff had to be approved by their human resources (HR) department, who would only allow consultation in exceptional circumstances where there was no other way of getting the information.   One  individual told me that any that they had wanted to access the correspondence that a former colleague had exchanged with a supplier about a particular contract, but HR had refused.

That HR department had no option but to be restrictive.  Imagine this scenario:  I work with a colleague, and  develop malicious intent, or an unhealthy curiosity, towards them.  They leave.  I think of a project that they worked on and say to the IT department that I need to look through their e-mail account to find records relating to that project.  What else might I look for/find?  That is why governance of e-mail archives is vital , including keeping non- deletable records of who searched for what terms under what authority , and what e-mails they opened and looked at.  This must include any searches made by any staff, whether end users or IT system administrators.

Is there any point in setting a retention rule that covers all the record keeping needs arising from an e-mail account if we cannot allow colleagues to access the e-mail accounts for those purposes?

The retention rule that we arrived at above was based on the full range of recordkeeping needs that we have in relation to the correspondence of an individual who is a close colleague or predecessor.  We now find that we cannot allow access to the e-mail accounts, even to close colleagues, for most of these purposes, because of the presence of personal information of a sensitive nature that is unmarked, unflagged, and undifferentiated from the rest of the mails in the e-mail account.

If we can only access e-mail accounts in response to overriding imperatives such as access to information requests, e-discovery requests and the need to defend or prosecute any legal case we might be involved in,   then should that be the only consideration we take into account in setting our retention rule? Should we only retain e-mail accounts for the period in which it is useful for us to have them in case of legal dispute?

If we only take into account the overriding imperatives of legal disputes and access to information requests then the logic for setting a retention rule becomes much more arbitrary:

  • if we adjudge the cost/risk of the e-mail accounts being subject to an access to information/e-discovery requests to be greater than the benefit of being able to use the e-mail accounts to support any case we would need to make in court,  then we would impose  a short retention period – perhaps the three months that Lawrence suggested
  • if we adjudge the benefit of being able to use the e-mail accounts of former members of staff to support any legal case we might want or need to make to be greater than the cost/risk of servicing access to information and e-discovery requests then we are likely to set a retention rule equivalent to a standard limitation period of seven years as Simon suggested (though you need to be careful with limitation periods – in some cases the clock of a limitation period may not start ticking until well after a member of staff leaves – for example if the person was working on designing a bridge, or a drug, or with children etc.)

The problem with this very pragmatic approach is that we will continue to fail to meet the day-to-day record keeping needs of our colleagues when they start a new job, and when they need to look back at the work of former colleagues.   And we will not not be able to make the record of the work correspondence of people playing important roles in society available to  future generations of policy makers, researchers and historians.

In his excellent Digital Preservation Coalition Technology Watch Report   on e-mail  Christopher Prom reported:

Winton Solberg, an eminent historian of American higher education, remarked … ‘historical research will be absolutely impossible in the future unless your profession finds a way to save email’ (Technology Watch Report 11-01: Preserving Email [PDF 916KB] by Christopher J Prom 2011,  page 5)

I will go one further and say that if we could solve the challenge of how we  provide an individual with routine access to the e-mail account of their predecessor, then we will be able to solve the challenge of how we provide access to that an e-mail account to historians or other researchers further down the line.  The two challenges are inextricably linked.

Many of our organisations have e-mail archiving tools, but these archives function as a murky sub-concious of the organisation, full of toxic secrets, inaccessible to the organisation in its normal day to day functioning,  and they pose a huge, ongoing,  information governance risk.

What we need is an approach to e-mail that results in staff leaving behind an e-mail account that their colleagues and successor can routinely access and use, without unduly harming either the account holder or people mentioned in their correspondence; and that we as an organisation can apply defensible access rules and retention rules to.

It is beyond the ability of a single organisation to develop such an approach (because it involves changes to available tools, changes to the way we think of an e-mail account, and changes to how we ask our colleagues to treat e-mail).  But it is well within the capability of the records management/archives professions to articulate such an approach, and then incentivise and cajole  venders (particularly the ecosystem around the big on-premise and cloud e-mail products/services) to create offerings that match it.

As a starting point I would like to see us as records managers and archivists getting this issue on the agenda of our organisations and of society more widely.

Two quick suggestions to get the ball rolling:

  • For records managers –  if you are concerned that important e-mails are not being moved out of e-mail accounts,  consider broaching the emotive subject of e-mail accounts when  building or revising your organisation’s records retention schedule.  Include in the retention schedule a list of those post holders in your organisation whose e-mail account contents you require be retained for a minimum of 20 years
  • For archivists working for the national archives of our nations -if you are concerned that important e-mails in government departments/ministries in your country are not being moved out of e-mail accounts,  then when you draw up or revise your selection policies,  include a list of posts in the various government bodies from which you require e-mail account contents to be appraised for permanent preservation in your archives

The state of records management in 2013: the challenges

Trends in records management

You can tell what issues are current in records management by the questions organisations pose to records management consultants.

Here are six situations that I was asked for advice on in 2012:  (whether that is in actual consultancy projects, invitations to tender, or informal questions over the phone!)
  • An organisation delivers projects in challenging environments around the world – areas of military conflict, political strife, economic turmoil and low internet bandwidth. They had developed a corporate records retention schedule.   Project records were spread across several different business applications. They had an application to capture the formal project documents (Project initiation documents, Project close out report etc.). They had an application to capture the contractual documents arising from any procurement done within projects.  Day-to-day e-mail correspondence arising from projects resided within individual e-mail accounts. Day-to-day documentation arising from projects resided in shared drives located in regional offices all around the world.  None of those these applications had any place to hold retention rules. How could/should they apply the retention schedule that they had just received corporate sign off for?
  • An organisation implemented an electronic document and records management system (EDRM), with a records classification (fileplan) and retention rules. Acceptance around the organisation was varied – some areas made little or no use of it but for others it was business crititcal.  The IT department were pushing heavily to introduce SharePoint as a collaboration and document management tool. What should their records managers do? Should they reject SharePoint entirely? Should they implement SharePoint but integrate it with their existing EDRM? Or should they find a way to use SharePoint as their document and records management system, replacing the EDRM?
  • For over a decade an organisation had done its records management within the Lotus Notes environment.  They had customised the e-mail client so that it integrated with a records repository.    They reached a point where they needed to upgrade the environment, but upgrading the customised e-mail client proved difficult so they switched to a standard Lotus Mail client, leaving them without e-mail integration to their records repository (also built in Lotus Notes).  One issue was deciding on priorities – should they stay in the Lotus Notes environment and rebuild the link between e-mail and records repository? or should they move to the Microsoft environment (Exchange and SharePoint) and tackle the records problems then?
  • An organisation implemented an EDRM integrated with the Microsoft Outlook e-mail client. The integration allowed end-users to use the key functionality of the EDRM from within Outlook. Without leaving their e-mail client they could create records folders within the EDRM, drag and drop e-mails into the EDRM, and search and consult records in the EDRM. How would it affect the EDRM if the organisation moved to cloud based e-mail with a standard e-mail client?
  • In an organisation people had taken on the habit of e-mailing colleagues links to documents in the EDRM, rather than sending the documents as attachments.  However this pattern had now changed: colleagues were reverting to e-mailing documents as attachments because attachments were quicker for people to access from their Blackberry/iPhone
  • An organisation wanted to implement SharePoint together with a records management plug-in. It wanted to choose a plug in that was seeking certification as being compliant with the MoReq2010 records management specification.  It wondered how their implementation of such a plug in would affect their set up of the collaboration features of SharePoint, such as the set up of team sites and the use of SharePoint content types
These challenges are all different in their specifics, but fundamentally they reflect the situation that records managers find themselves in at this point in history. Records managers are attempting to create the capability to apply over-arching and timeless records management frameworks (organisation -wide records classifications/fileplans and retention rules) in a fragmented environment of multiple applications (e-mail clients, collaboration tools, line of business applications) in the shifting sands of changes in every aspect of organisational technology (the increased adoption of portable devices, the move to the cloud, the ongoing software update cycle).

The tension between the need for stable records management and the need to keep up with the onward march of technology

There is a fundamental tension within organisations between an organisation’s need to manage its IT and its need to manage its records.

The IT imperative is to be agile.  The organisation needs to keep upgrading applications, productivity tools and devices to keep up with the latest offerings from vendors, and the latest developments in technology and working practices.  The IT nightmare is to be stuck on old versions of things. Stuck on Windows XP, stuck on Office 2003, SharePoint 2007, Internet Explorer 6. Stuck with applications that only run on desktop PCs  when colleagues are trying to work from tablets and smartphones. Stuck with applications that only run on on-premise servers when the organisation wishes to move to the cloud.

On the assumption that you are happy with the providers/vendors of your existing business applications, the best way to avoid being stuck is too keep customisations and integrations to a to a minimum. Customisations and integrations complicate and slow down upgrades, and would complicate any move to the cloud/adoption of new devices.

The records management imperative is to apply frameworks and rules.  Apply a records classification, apply a set of retention rules, lock down records so that they cannot be deleted unless in accordance with retention rules,   Ensure that systems can export records and all of their metadata in a format that can be imported to successor applications.

The clash between these imperatives comes because each organisation deploys many different applications.  An individual in the course of a working day might use an e-mail application, a shared drive, SharePoint team site, a line of business application.  Off the shelf none of these things has the functionality to apply those records management frameworks and rules.   So we need to do something to those applications – we either need to customise them so they can manage their own records; buy plug-ins for them so they can manage their own records; connect then to a governance tool that can manage their records from the outside; or connect them to a records repository that can take their content from them (either immediately or at a later point in time).

In other words the records management requires each application to be customsed/integrated/connected and/or extensively configured, and it is this need that creates the tension between the records management imperative and the IT imperative.

There would be no tension between IT and records management if stand-alone records repositories worked (but they don’t)

There would not be any tension with between the IT imperative and records management imperative if  records management could be done by a single stand-alone application.

  • Lets imagine you installed an electronic records management repository, with a corporate records classification and retention rules, capable of protecting records from amendment and premature destruction.  It might be an Electronic document and records management system (EDRMS) that complies with one of the specifications issued by nations/trading blocks such as the US DoD 5015.2 specfication.  Or it might be a SharePoint records centre.
  • Lets imagine that everyone in the organisation was perfectly happy, whenever they created or received an important e-mail or document,  to stop what they were doing, go into the EDRMS/SharePoint records centre, and upload a copy of that document/communication
If this was the case then all the other applications in the organisation would not matter to the records manager.  The applications used to communicate, collaborate and get work done could be installed, upgraded, replaced, moved to the cloud etc. as and when required, without the records manager needing to ask for any customisation/ integration/connection or integration of those tools.

However we know from experience that this is not the case.  When an individual creates or receives a document/e-mail, they know that it stored safely enough in an application (whether that application is their e-mail account, their shared drive, a SharePoint team site, a line-of-business application or anything else) .   They have no incentive (and possibly no time) to move it anywhere else.

The force of gravity applies. To extract the documents/e-mails needed as records from these applications is going to require an investment of energy either by the end-user on a day-to-day basis, or  on the part of the organisation in a one-off investment in customisation/integration/configuration/connection.

The key feature of a records management system is not the functionality of the records repository itself,  It is the link between the repository and the applications people use to do their work.  For example the weakness of the records management model in the last two versions of SharePoint had nothing to do with the quality or otherwise of the records repository provided within SharePoint (the SharePoint records centre).   It had everything to do with the difficulties organisations have had in defining and configuring the rules by which documents were routed from each particular  document library in each SharePoint team site to an appropriate place in the SharePoint records centre, and the lack of out-of-the-box integration between SharePoint as a whole and e-mail.

Records management risks becoming a victim to the rapid pace of change and a consequent shortening of planning horizons

When you start talking about customising an application, or integrating it with other applications or extensively configuring it, then you are talking about a considerable amount of lapsed time.  The time to make the customisations/integrations/configurations, put them into production, and for the changes to be used enough to yield benefits.

A records manager might say something like:

  • ‘we need to customise the e-mail client so people can declare an e-mail directly into the records system’  OR
  • ‘before we roll out SharePoint team sites lets define all the content types we need and plan out what sites and document libraries we need and what content types and routing rules we need to configure to ensure that the content goes to the right place in the SharePoint records centre where we have our records classification and retention rules’  OR
  • ‘Lets find a way of integrating this hugely important line of business system into our records repository’
An IT manager might answer:

  • ‘Yes but that will take a year to implement and in two years time we will be moving  e-mail out to the cloud/moving to the next version of SharePoint /using smartphones and tablets for everything’    The customisation/configuration/integration  you propose  wouldn’t work with cloud e-mail/isn’t needed in the next version of SharePoint/ won’t be usable on tablets’.

Of the organisations that I advised in 2012:

  • some had once had a workable records management relationship between their records repository and key applications such as e-mail but technological changes had disrupted the set up.
  • some had got a workable records management system at the time of my contact with them, but changes in technology threatened to disrupt that
  • some had not been able to set up any sort of records management system and either the pace of technological change or the large number of different applications involved was preventing them from getting started.
The challenge for records managers then is not only ‘how do we set up a records system that works, and that ensures that content needed as records can be protected, organised and subject to retention rules’. The challenge is also ‘how do we set up the records system in such a way that as the business wishes to adopt applications/technologies it can, and these application/technologies can become part of the records system rather than sitting apart from it (and hence bypassing and undermining it)’.

Resolving the tension between records management’s need for stability over time and an organisation’s need to keep pace with technology

Stability and agility are like ying and yang- they look like the complete opposite of each other, but in reality they are dependent on each other, and each needs a bit of the other.

  • The more frequently an organisation changes applications (the more agile it is) the more it needs it needs a stable records repository to maintain content from the many and various legacy applications that it has ceased using.
  • The more applications an organisation has, the more there is a need for some sort of central governance to give coherence across those many different applications

There are essentially two different approaches we could take to managing records now we know that a stand-alone repository has limited use,  that every organisation deploys multiple applications,  and that the cloud era is going to make it more important to ensure that content and its metadata is not locked into on any one particular software-as-a-service application. 

We could either:

  • deploy a records governance tool that leaves content in the different applications that an organisation deploys, but intervenes into those applications to apply the relevant records classification and retention rules to content within them, and to protect records from amendment or deletion 
  • deploy a repository that can accept content from the different applications that the organisation deploys

Having either or both of these solutions in place would enable us to propose a solution to the tension between agility and stability.   The records management function can say to the business ‘ the organisation can adopt any application it wants/needs, and can have applications on-premise or in the cloud, so long at these applications either:

  • allow our records governance tools to intervene to govern content held in the applicationOR
  • can contribute content and their metadata on to the organisation’s records repository

You may be sitting in an organisation that does not currently have anything useable either as records governance tool or as a records repository.  If you have dozens  of different applications you may be wondering how you are going to connect them all to any future governance tool/ or records repository.  You may be struggling to picture what such a governance tool/such a repository looks like, and what the connections would look like between your business applications and the governance tool and or the records repository.

The main challenge facing the records management world is to make this vision not just desirable but also feasible

Why a link between MoReq2010 and the OAIS model would benefit both records managers and archivists

The dream of a single record keeping profession

It is roughly twenty years since Frank Upward began popularising the records continuum as a paradigm shift away from the previously prevalent records lifecycle model. It was the early 1990s, the digital revolution was about to hit organisations and Upward did not believe that a body of professional thought based on the lifecycle paradigm would cope with it.

Upward had both philosophical and practical concerns about the lifecycle model.

His philosophical concerns stemmed from the fact that the records lifecycle model depicted records as moving in a straight line through time: from creation of the record, through an active phase (where it is being added to and used); a non-active phase (where it is kept for administrative and/or other reasons); until final disposition (destruction, or transfer to an archive for long-term preservation). Upward pointed out that whereas Isaac Newton believed time moved in a straight line like an arrow, Einstein had proved that time and space were inseparable and that both were warped by the speed of light.

Upward compared records to light. Light carries information about an event through time and space. So do records. Upward based his records continuum diagram on a space-time diagram. The space time diagram depicts the way that light travels in every direction away from an event through space and through time. Noone can know about an event unless and until the light has reached them. The records continuum diagram showed that records need to be managed along several different dimensions in order to function as evidence of that event across time and space, and in order to reach different audiences interested in that event for different reasons, at different times and in different places.

Sketch of Frank Upward next to a drawing of the continuum model

Upward’s practical concern related to the fact that the lifecycle model had been used to underpin a distinction between the role of the records manager and of the archivist. The records manager looked after records whilst they were needed administratively by a creating organisation, the archivist looked after records once they were no longer needed administratively  but still retained value to wider society.

The interest of wider society in the records of any particular event do not suddenly materialise 20 or 30 years after an event. The interest of society is present before the event even happens. A records system of some sort needs to be in place before the event happens in order for the participants/observers of the event to be able to capture a record of it. That system needs to take into account the interest of wider society in the event in order for the records to have a fighting chance of reaching interesting parties from wider society if and when they have the right to access them. This concern is particularly pertinent to digital records. Whereas paper records could left in benign neglect, digital records are at risk of loss if they, and the applications that they are held within, are not actively maintained.

Upward didn’t use the word archivist or the word records manager. To him we are all record keepers. What we do is records keeping, and we belong to the record keeping profession.

One of the big impacts of the digital revolution, and of the paradigm shift from the lifecycle model, was the shift of attention away from the form and content of records themselves, and towards the configuration of records systems. In this post I will compare the way records managers have gone about the business of specifying records systems with the way archivists have gone about defining digital archives.

The continuing divide between records managers and archivists

Upward’s plea for a united recordkeeping profession has gone largely unheeded in the English speaking world. Twenty years into the digital age we still see a profound cleavage between not just the roles of archivists and records managers inside organisations, but also their ambitions and strategies with regard to electronic records.

The DLM forum is a European group that brings together archivists (mainly from the various national archives around Europe) and records managers. When you are listening to a talk at a DLM event you can always tell the records managers and the archivists apart:
  • The records managers refer to MoReq2010 (developed by the DLM Forum itself ) –  a specification  of the functionality that an application needs in order to manage the records it holds
  • The archivists talk about OAIS (open archival information information systems) a standard for ensuring that a digital repository can ingest, preserve, and provide renditions of electronic records that have been transferred to the archive
This reflects a difference in the initiatives that the two branches of the profession have adopted towards electronic records

Records management initiatives

The records management profession have attempted to design records management functionality into the systems used by the end-users who create and capture records. Over the period 2000 to around 2008 this strategy was mainly centred around specifying and implementing large scale corporate wide electronic document and records management systems (EDRMS) Unfortunately a relatively small percentage of organisations succeeded in deploying such systems, and even those that did still found that many records were kept in other business applications and never found their way into the EDRMS.

We are now in the early days of the development of alternative records management models. MoReq2010 is the most recent attempt to influence the market and specify the functionality of electronic records management systems. MoReq2010 is framed to support several different models. It continues to support the EDRM model, but it also supports the following alternative models:
  • building in records management functionality into business applications (and storing records in those applications) (in-place records management)
  • storing records in the business applications into which they were first captured, but managing and governing them from a central place (federated records management)
  • integrating the many and various business applications in the organisation into a central repository which stores, manages and governs records.
The key thing that all three of these newer approaches have in common is that they each involve records passing from one system to another during their lifetime. For example in the in-place model even if an organisation suceeded in installing records management functionality into every one of their applications (a distant hope!), it would still need to make provision for what would happen to the records held by an application that it wished to replace. For that reason MoReq2010 pays particular attention to ensuring that applications can export records with their metadata in a way that other applications can understand and use.

Archival initiatives

The strategy of archivists has been to design digital archiving systems which can capture records from whichever system(s) a record creating organisation deploys. In theory it would not matter what applications an organisation (government department/agency etc.) used to conduct their business, and it would not matter whether or not that application had records management functionality. The archives would still be able to accession records from them provided that they succeed in:
  • building a digital archive that can receive accessions of electronic records and their associated metadata
  • defining a standard export schema/metadata schema dictacting exactly how what metadata needs to be provided, and in what form, about records transfered to the archives
  • enforcing that export schema/metadata schema so that all new transfers of electronic records come in a standard form that is relatively straightforward for the archive to accession into their digital repository
Unfortunately only a small number of national archives have succeeded in making electronic accessions of records into anything remotely resembling a routine. Some archives have succeeded in building digital archive repositories, The UK National Archives has got one, so has Denmark, the US, Japan and others. But the process of accepting transfers of electronic records into these archives is problematic. Every different vendor sets up their document management systems to keep metadata about the content their applications hold in a different way. The first time the archives accepts a deposit of records from each different system there is a lot of work to do translating the metadata output from that system to a format acceptable to the digital archive repository. The resources required to do this work either has to be provided by the Archives, or by the contributing body.

Jon Garde summed the situation up when he said in a talk to the May 2012 DLM Forum members’ meeting that ‘most records never leave their system of origin’ This comment serves as a sorry testament to the success of the initiatives undertaken by both sides of our profession hitherto.

Jon Garde

Lack of a join between records management and archival initiatives

It is rare to see examples of a joined up strategy between records management and archival initiatives. In the United Kingdom the National Archives (then the Pubic Record Office) started out in the early years of the digital age by taking a great interest in the way government departments managed their electronic records, in the hope that this would make it easier for the Archives to accept electronic records from those departments. The records management arm of TNA defined the UK’s electronic records management system specification. Between 2002 and around 2008 they supported government departments in implementing the EDRM systems that complied with those specifications. But the archivists at TNA derived little benefit from this.

The TNA issued both versions of its electronic records management specification without a metadata standard and without an xml export schema. This meant that each compliant EDRM system from each different vendor kept metadata about records in a different way, and hence the transfer of records from those EDRM systems to the National Archives would need to be thought through afresh for each product. By the time the National Archives did get round to issuing a metadata standard they had already made the decision to stop testing and certifying systems (in favour of the Europe-wide MoReq standard). The abscence of a testing regime meant that vendors had no incentive to implement it into their products. But even if vendors had implemented their metadata standard, the TNA would have benefited little from it on the archive side. This is because TNA decided not to use that metadata standard for their own digital archive repository.

The OAIS model

The OAIS model is a conceptual model of what attributes a digital archive system should possess. It makes clear one of the key differences between a traditional archive of hard copy/analogue objects, and the digital archive.

A hard copy archive, in the main, produces to the requestor the very same object that they have been storing in the archive, which in turn is the very same object that was transferred to the archive by the depositing organisation.

In a digital archive this does not hold true. The object originally transferred to the archive may need to be changed or migrated to new software or hardware, so the digital object actually stored in the archive will differ from the digital object originally submitted to it . At the point in time when a requestor asks to see the record the digital archive will usually make a presentation copy for them to view, rather than providing them with the object that they store in the repository. The object that they provide to the requestor may differ in some respects from the object stored in the archive, for example if the archive wishes to present a version of the object that is better adapted to the browser/software/hardware available to the requestor.

The OAIS model came up with a vocabulary to describe these three seperate versions of the record:
  • The object originally transferred to the archive is a submission information package (SIP)
  • The object stored in the archive is an archival information package (AIP)
  • the object provided to the requestor is a dissemination information package (DIP)
OAIS has no certification regime, so there is no way for proprietary products, open source products or actual implementations to be certified as compliant with the model. At various times the archives/digital preservation community has debated whether or not it should have a certification regime (see this report of an OAIS workshop run by the Digital Preservation Coalition). Some archivists have felt that it is is an advantage that OAIS does not have a certification regime, because it allows vendors and organisations the flexibility to implement the model in different ways. Others have felt that the lack of a certification regime hinders interoperability between archives.

An example of the OAIS model way working well – The Danish National Archives

I had a tour of the Danish National Archives on 31 May 20102 (the first day of the members meeting of the DLM Forum). The Danish National Archives has a very well functioning process based on the OAIS model. They have laid down a clear standard for the format in which Danish government bodies transfer records plus their metadata (submission information packages) to the Archives. Government bodies send records on optical disk or hard drives. The archives gives each accession a unique reference. Then it tests the accession to ensure it conforms to the standard. The testing is performed on a stand alone testing computer. Each accession is called ‘a database’, because the accession always comes in the form of a relational database.  Such relational databases typically hold metadata together with the documents/content that the metadata refers to.

I asked whether a government department could deposit a shared drive with the archive. They replied that the department would have to import the shared drive into a relational database first in order to format the metadata needed for the accession. This brought home to me the fact that when an archive imposes a standard import model it does not reduce the cost of transferring records from many and various different systems used by organisations to one digital archive. It merely places a greater proportion of the cost of the migration on the shoulders of the transferring bodies.

It is not necessarily easy for other national archives to replicate the success of the Danish National Archives. An archivist from the Republic of Ireland accompanied me on my tour of the Danish National Archive. He is in charge of electronic records at the archives of the Republic of Ireland. The Irish archives have not been able to get a standard format agreed for government departments to send accessions of electronic records to them. From time to time government bodies send accessions of electronic records, principally when a government body is wound down. The archives can do nothing more than store the accessions on servers and make duplicate copies. They have no digital archive repository to import them into. Even if they did have an archive repository the fact that the accessions are in such different formats means that the process of ingesting the accessions into such a repository would be an extremely time consuming and lossy process. The chances of the archives persuading the rest of the Irish government to accept a standard format and process for transferring electronic records are slim because in times of austerity it would be seen as an extra administrative burden.
(For more details on the approach of the Danish National Archive  watch this 25 minute  presentation  by Jan Dalsten Sorensen)

An example of the records management approach working well – the European Commission

The European Commission  has taken a records management approach to managing records from their creation until their disposal or permanent preservation.

They started off with a fairly standard electronic document and records management system (EDRMS) implementation with a corporate file plan, and later with linked retention rules. But then they expanded on this model.  They are currently in the process of integrating one-by-one their line-of-business document management systems into the EDRM repository. The ultimate aim is that a member of staff could choose to upload a record into anyone of the Commission’s document management tools and still have the record captured in a file governed by the Commisssion’s filing plan and retention rules. They are also developing a preservation model for the historical archives. This module will enable records to pass from control of the Directorates-General (DG) of the Commission that created them, into the control of the Historical Archives without leaving the EDRM repository itself.

The model is not perfect (like every other organisation they find it difficult to persuade colleagues to contribute e-mail to the EDRMS), and it is not finished (not all the different document management systems have been integrated yet, not all the functionality needed to manage the process of sending records to the control of the Historical Archives has been added yet), but it is a very well thought through and solid approach, that has successfully scaled up to cover nearly 30,000 people.

As with the Danish National Archives, it would not be easy for other organisations to replicate the success of the European Commsision’s approach.The Commission’s success has come as a result of a records management programme that was started in 2002, it has taken a considerable amount of time (ten years) and a considerable amount of political will to draft the policies, build the filing plan, draft the retention schedule, establish the EDRM, and to commence the integration of other document management systems into the EDRM. The integration of each document management system into the EDRM is a new project each time, requiring developers to work on the document management system in question in order that it can use the EDRM systems object model to deposit records into the respository.

In these turbulent times of economic austerity it is hard to envisage many organisations embarking on a records management programme that would take 6 to 8 years to deliver benefits.

How do we make it more feasible to manage records over their whole lifecycle?

The facts that these two excellent examples, from the Danish National Archives and the European Commisson are so difficult to replicate is a concern for both the records management and archives professions.

In an ideal world every records management service would operate a records repository, every archive would run a digital archive. In an ideal world the records managers would not need to get developers to do any coding to enable business applications to export their records into the records repository – the applications would be configured so that they could export records and all accompanying metadata in a way that the repository understood.
In an ideal world an Archive running a digital archive would not have to specify to their contributing bodies that they need to tailor and adjust the exports of their application. In an ideal world those bodies could run a standard export from any of its applications, that the Archive could import, understand and use.

The key enabler for both of these things is a widely accepted standard on the way metadata on things like users, groups, permissions, roles, identifiers, retention rules, containers and classifications are kept within applications, coupled with a standard export schema for the export of such metadata. If such a standard schema existed then a records repository owner or digital archive owner could specify to the owners of applications that needed to contribute records to the repository/digital archive that they either:
  • 
implement applications that keep and records and associated metadata in that standard format, OR
  • 
 implement applications that can export metadata in the standard export format , even if the metadata within the application had been kept in a different way
  • develop the capability to transform exports from any of their applications into the standard export schema. This last point should be helped by the fact that any widely accepted export schema would lead to the growth of  an ecosystem of suppliers with expertise in converting exports of records and metadata into that format.  Indeed such a format could become a ‘lingua franca’ between different applications.

The opportunity for a link between MoReq2010 and the OAIS model

The only candidate for such a standard export format at the moment is the MoReq2010 export format, published by the DLM Forum. The DLM forum comprises both archivists and records managers, but most of the archivists have hitherto taken relatively little interest in MoReq2010.  On June 1 this year (the day after our visit to the Danish National Archives) I gave a presentation to the DLM members forum meeting suggesting that the archival community should develop an extension model for MoReq2010, such that any system compliant with that module would also have the functionality necessary to operate in accordance with the OAIS model.

This would have a number of beneficial effects. For the first time in the digital age we would have a co-ordinated specification of the functionality required to manage records at all stages of their lifecycle including managing archival records.

It would also be a huge boost for MoReq2010. The first two products to be tested against MoReq2010 will be SharePoint plug-ins – one produced by GimmalSoft, one by Automated intelligence. Let us assume that both products pass and are certified as compliant. Both products will be able to manage records within the SharePoint implementation that they are linked to. Both will be able to export records in a MoReq2010 compliant format. But there still won’t exist in the world a system capable of routinely importing the records that they produce. This is because the import features of the MoReq2010 specification are not part of the compulsory core modules of MoReq2010 – instead they are shortly to be published as a voluntary extension module.

Let us imagine that a National Archive somewhere in the world deploys a digital archive, that complies with the OAIS model, and that can import records exported from any MoReq2010 compliant system.  All of a sudden there is a real incentive for that archive to influence the organisations that supply records to it to deploy MoReq compliant applications  (or applications that can export in the MoReq2010 export schema, or MoReq2010 compliant records repositories).  It works the other way around too.  Let us imagine there is a country somewhere whose various government departments deploy MoReq2010 compliant applications.  All of a sudden there is an incentive for their National Archives to deploy a digital archive that is compliant with the import module of MoReq2010 and can therefore routinely import records and metadata exported from those MoReq2010 compliant applications.

Debate at the DLM members forum on an OAIS compliant extension module for MoReq2010

The suggestion of an OAIS compliant extension module for MoReq2010 sparked off an interesting debate at the May DLM forum members meeting. Tim Callister from The National  Archives (TNA) in the UK. and Lucia Stefan,  both criticised that OAIS model. They said it was designed for the needs of a very specialised sector (the space industry, with their unique formats and data types) and was not tailored for the needs of national archives who are largely tasked with importing documents in a small range of very well understood file formats (.doc, .pdf etc.). Jan Dalsten Sorensen from the Danish National Archives defended OASIS, saying that it had given archivists a common language and common set of concepts with which to design and discuss digital archives.

I said that any digital archives extension module for MoReq2010 should be compatible with OAIS – if only because otherwise  it would lose those archives (like the Danish National Archives) who had invested in that model. It would also lose the connection with all the thinking and writing about digital archives that has utilised the concepts of the OAIS model

After the debate I spoke to an archivist from the Estonian national archive. He said that his archive didn’t want lots of metadata with the records that they accession. I said that was because the more metadata fields that they specified in their transfer format the greater the amount of work that either they or the contributing government department would have to do to get the metadata into the format needed for accessions. If their contributing government departments had systems that could export MoReq2010 compliant metadata, and if the digital archive could import from the MoReq2010 export schema,  then they wouldn’t need to be pick and choose the metadata – they could take the lot.

Information assurance and encryption

Alison Gibney spoke to the June meeting of the IRMS London Group about the relationship between the disciplines of information assurance and records management.

Alison differentiated  information assurance from information security.  Information security covers any type of information an organisation wants to protect, whereas information assurance is focused on protecting personal data.    There is no US equivalent term, partly because the US legislation on personal data is less strict than that of the UK.

Alison said that in the UK public sector over the last five years records management has gone down a peg or two (thanks to budget cuts) , whilst  ‘information assurance’ has gone up a few pegs.  The rise of information assurance is thanks to the various high profile central government data leaks in 2007 and 2008; the Hannigan report which compelled UK government bodies to adopt a strong information assurance regime in response to those leaks; and the fines meted out by the Information Commissioner for non-compliance with the Data Protection Act.

Alison showed a list of the fines meted out by the Information Commissioner over the past few years.  She pointed out that a significant proportion of the fines had gone to local authorities.  This was not necessarily because local authorities are worse at managing personal data than, say, a private sector retail company.  It is more likely to be because local authority have literally hundreds of different functions that necessitate the collection, storing and sharing of personal data, whereas a retail operation may only have three or four such functions.  It is very hard for a local authority to ensure that all these many different functions are fully compliant with data protection legislation.

Alison also pointed out that most of the fines could be ascribed to two generic types of breaches:

  • communications being sent to the wrong person
  • loss or theft of removable media

These two types of generic breaches both occurred across a range of formats, digital and hard copy.   Communication misdirections included misdirected e-mail, letters and faxes.  Loss and thefts of removable media included losses of laptops, key drives and hard copy files.

When to encrypt and when not to encrypt

Alison said that encryption offered a solution to the problem of protecting personal data in certain circumstances.
Alison recommended encrypting:

  • personal data in transit (for example data being e-mailed to a third party) because of the risk of interception or misdirection
  • personal data on removable media (optical disks, laptops, mobile phones etc) because of the risks of loss or  theft

Alison did not recommend encrypting ‘data at the rest’ in the organisation’s databases/document management systems.   But she raised a question mark over data in the cloud.  Technically it is data at rest.  But it is data held by another organisation, possibly within a different legal jurisdiction, and the organisation may wish to encrypt because of that.

It is worth taking a closer look at the issues around the encryption of the different types of data that Alison mentioned.

Encrypting data in transit – e-mail

There are various ways of encrypting e-mail.   A typical work e-mail travels from a device, through an e-mail server within the organisation, to an e-mail server in the recipient’s organsiation, to the device of the recipient.  There are security vulnerables at any of those points.  Devices bring in a particular vulnerability particularly since the rise in usage of smartphones and of trends such as ‘bring your own device’.

The most secure option is for the message to be encrypted all along the chain.   However the further along the chain the e-mail is encrypted the more complex, expensive and intrusive the encryption software and the procedures for applying it become.   Chapter 4 of this pdf hosted by Symantec gives a neat summary of the different options for e-mail encryption.

If you decide to encrypt messages from the moment they leave the sender’s device  all the way to the recipient’s device (endpoint to endpoint) then both the sender and the recipient will need encryption software installed on their device.  This is intrusive for both parties.  It may be reasonable to expect organisations that you regularly exchange sensitive data with to have such software installed.  It would not  be reasonable for a local authority corresponding with a citizen for the first time to expect the citizen to install such software.

A less intrusive option  is ‘gateway to gateway’ encryption.  The  message goes in plaintext from the sender’s device to a gateway server inside their organisation. The gateway server encrypts it.  When it reaches the recipient organisation it is decrypted by a gateway server which sends it on in plaintext to the recipient.  Note that this model requires the recipient organisation to have the same encryption software installed on their gateway server as is used by the sending organisation.

A lighter approach to encryption is the gateway-to-web approach where a standard plaintext e-mail is sent to a recipient giving them a link to a web address where they can go to retrieve the message which is protected by some kind of transport layer encryption such as SSL (Secure sockets layer – as used to  protect credit card transactions on the web).  The web site will ask the user for authentication.  Assuming that the authentication details have been sent to the user by a different channel other than e-mail, this will provide some protection against an e-mail being sent to the wrong recipient.  In her talk Alison had informed us that 17 London boroughs had chosen encryption software from one particular vendor (Egress) – that uses this gateway- to-web model.  Although this is a lower level of security than endpoint to endpoint encryption, it has the crucial advantage that the recipient does not need to install encryption software.

Encrypting data on removable media

Encrypting personal data on removable devices is the most straightforward of the cases that Alison mentioned.  The individual who owns the device it with their (public) encryption key.  Only they have the (private) encryption key necessary to decrypt it.  If the device gets lost or stolen the data is safe so long as the person who gets it does not get the decryption key.

Encryption and data-at-rest

Bruce Schneier points out in his post Data at Rest vs Data in Motion (http://www.schneier.com/blog/archives/2010/06/data_at_rest_vs.html) that cryptography was developed to protect data-in-motion (military communications), not data-at-rest.   If an organisation encrypts the data it holds on its on-premise systems then it faces the challenge of maintaining through time the encryption keys necessary to decrypt the data. Schneier puts it succintly:  ‘Any encryption keys must exist as long as the encrypted data exists. And storing those keys becomes as important as storing the unencrypted data was’ .  If you are encrypting the data to guard against an unauthorised person overcoming the system’s security model and gaining access to the data, then logically you cannot store the decryption keys within the system itself (if you did you would not have guarded against that risk!).

Encryption and data in the cloud

Most organisations have regarded data at rest within their on-premise information systems as being significantly less at risk than data on removable devices and data in transit, and therefore do not encrypt it.

But what about data in the cloud? There are certain features of cloud storage that may lead your organisation to wish to encrypt its data.  You may be concerned that a change of ownership or a change of management at your cloud provider might adversely impact on security.  You might be worried that the government of a territory in which the information is held may attempt to view the data.  You may be concerned about the employees of the cloud provider seeing the data.  However fundamentally speaking, data in the cloud is data-at-rest.  It is data that is being stored not communicated.  .  If you encrypt your cloud data you have the same problem as you would have if you encrypted data on your on-premise systems – how do you ensure that you maintain the encryption keys over time, and where do you store them?  If your reason for encrypting is a lack of trust of the cloud provider then storing the decryption keys with the cloud provider would defeat the object.

The challenge of encryption key management

Key management is crucial to any encryption model. In theory you want the organisation to give every individual a private/public encryption key pair. That means that not only can individuals encrypt information (with their public key) that they wish to keep secret.  But you also can provide a digital signature capability because they can encrypt with their private key a signature that anyone can read with that individual’s public key.  This offers proof that the individual alone must have signed it because it could only have been encrypted with the individual’s private key.  Thus the signature is in theory non-repudiatable (the indivdual could not deny that it was they that sent it).

There are a couple of interesting issues around key management.  Do you make it so that only the individual knows their private key? – in which case the digital signature is genuinely non-repudiatable.  But if the individual loses their private key then they lose access to their signature and to information that they have encrypted.  The alternative is that the organisation provides a way for the individual to recover their private key, for example through an administrator.  But this means that the digital signature is in theory now repudiatable because the administrator could have used it to sign a message.

I am currently reading a wonderful book that explains why even two decades into the digital age there is still no widely accepted digital signature capability – the vast majority of organisations do not use digital signatures (and hence still have a need to keep some paper records of traditional blue ink signatures) The book is called burdens of proof and is written by Jean-Francois Blanchette.

Forthcoming extension and plug-in modules to MoReq2010

A unique feature of MoReq2010 when compared to previous electronic records management specifications is the provision for the DLM Forum to publish optional extension modules and plug-ins to extend the compulsory core modules of the specification.

This will enable the specification to embrace needs specific to some but not all organisations/sectors, without imposing costs on vendors who intend to develop products that do not service those organisations/sectors.

It will also enable the specification to develop over time to respond to new needs created by the ever-evolving world of applications used in business.

Yesterday afternoon at the DLM Forum members meeting in Copenhagen I heard Jon Garde announce a list of MoReq2010 extension modules and plug-in modules that would be developed over the coming 12 months.

None of these extension/plug-in  module will be compulsory.  Vendors will continue to be able to achieve MoReq2010 compliance with a product that does not meet any of them.  However vendors will be able to ask for their product to be tested against them.

Extension modules

Extension modules provide an extension to the core modules of the specification, but do not replace the core modules.  Below I have listed the extension modules that are due to be published in the next 12 months

Transformed records extension module

This  module will define a capability for a MoReq2010 compliant system to manage records that have been annotated and/or redacted.  The special issue around these relate to the fact that the system needs to manage the relationship between the unannotated/unredacted version of the document, and the annotated/redacted version(s).

File aggregations extension module

The core modules of Moreq2010 replaced the concept of a ‘file’ (which had been present in all previous ERM specifications)  with a much more flexible concept of an ‘aggregation’.  The concept of an aggregation refers simply to the containers that users use to organise their records. That could be anything from a folder structure to a SharePoint document library to an e-mail inbox.  The concept of an aggregation was made as flexible as possible in order to offer the vendors of products as diverse as e-mail clients, collaborative systems, wikis etc the possibility to secure MoReq2010 compliance.

However there may be some organisations who simply want a system that works like a traditional EDRMS, with a hierarchical classification, and with users restricted to creating files that can have sub-files and/or volumes and into which they can place documents.

The file aggregation module will define the capability for a system to hold aggregations called ‘files’ that can contain sub-files but cannot sprawl into multi-level folder structures.  This will also give backward compatibility to the predecessor MoReq2 specification.

Import services extension module

The import services module will define  the capability for a MoReq2010 compliant system to import data exported from any other MoReq2010 compliant system.  Of all the extension modules Jon mentioned, this is the most important to my mind.

The whole point of MoReq2010 is the idea that most records need to be migrated from one system to another at some point in their lifecycle, and hence every MoReq2010 compliant system must be able to export its records in the format specified by the export services module of MoReq2010, which is a core and compulsory module.

I predict that MoReq2010 will come to life if and when a vendor brings to market a product that complies with the optional import services extension module that is in the course of development.  Any organisation that deploys such a system as a records repository has a huge incentive to make its other applications MoReq2010 compliant.  It would know that the minute it wanted to replace such an application it could export all the content and import it into its records repository without a complex custom migration process.

Security categorisation services extension module

This will define a capability for a MoReq2010 compliant records system to implement security classifications in MoReq2010 such as secret, top secret etc.

Physical management services extension module

This will define a capability for a MoReq2010 compliant records system to manage physical objects (such as hard copy records)

E-mail client integration extension module

This will define a capability for a MoReq2010 compliant records system to integrate with an e-mail client such as Microsoft Outlook in order to capture e-mail as records into the system.

Plug-in modules

Plug-in modules for e-mail and for Microsoft Office documents

The function of plug-in modules within the MoReq2010 specifications are to provide alternative ways of implementing a particular type of functionality.

For instance a MoReq2010 system keeps content (for example electronic documents) in the form of components, that are managed as records.  The core modules of the specification contain a rather generic ‘electronic component’. Over the next 12 months two new plug-in modules will be written: one for e-mail and one for Microsoft Office documents.  These plug in modules will define the requirements necessary to capture and manage metadata specific to e-mail and Microsoft Office documents.

These formats have been chosen simply because they are in wide usage.  It is still possible to bring other formats into a MoReq2010 compliant systems – they will can  be brought in using the more generic electronic component.  More plug-in modules will be written in future years.

Jon anticipates that there will be a new version of MoReq2010 published annually to include all the new extension and plug-in modules

The nature of electronic records – podcast with Ben Plouviez

Between 2004 and 2006 Ben Plouviez (@benplouviez) oversaw the roll out of an EDRM (electronic documents and records management) system across what was then the Scottish Executive (but is now the Scottish Government).

Six years later and the system contains 14 million documents and is used by around 4,000 staff.

In this podcast Ben reflects even-handedly on both the benefits that having an organisation wide records repository has brought to the Scottish Government, and on the promises that the system has not fulfilled.

The roll out of the EDRM was driven partly by the Scottish Executive’s desire to breakdown silos between the various different parts of the administration. They  made the decision that wherever possible files would be open and accessible to the whole of the Scottish Government. There have been times when colleagues have found documentation that they would never have known existed were it not for the EDRMS.

The EDRM’s Scottish Government wide business classification scheme has not been an unqualified success, but nor could it be called a failure.  It is not terribly popular with users, who rarely use it to navigate to material that they wish to find.  However on the plus side the scheme has provided a stable and enduring  structure for the system.

Ben has found that the electronic files on the EDRM system do not tell a narrative in anything like as clear or as useable way as a typical paper file used to do.  Ben questioned whether it was feasible  for records managers to expect their organisations to keep a full electronic file of every piece of work they carry out.  Ben said that the concept of the file is predicated on the concept of the document and we are now seeing alternatives to the document in the form of blogs, wikis, discussion forums, etc.  None of these new formats fit naturally into the file.  I found it significant that MoReq2010 specification used the word ‘aggregation’ instead of the word ‘file’.  This implies that in the electronic world there are many different ways in which business communications can be collected (e-mails in in-boxes, tweets in tweet streams, etc..).

There have been some unexpected benefits to having an organisation-wide records repository.  For example Scottish Government have taken information from the system’s audit logs about who has read what on the EDRM and translated it into rdf triples (the non-proprietary format that underpins linked data and the semantic web).  They have then provided an interface to enable colleagues to query this data to find out what their colleagues have read on the system. This enables the serendipitous finding of documents of curerent interest, and provides a more human way of browsing and interrogating the system than that provided by either the business classification or by the search facility.  The Scottish Government have also used the same technique in relation to e-mail logs.  They have taken the records of who sent an e-mail to who and when, converted it to rdf, and provided a query and visualisation interface.  This means colleagues can find out who has been corresponding with particular colleagues or stakeholders.    Note that the content of the e-mail is not accessible, and that only e-mails with at least one person in cc have been included to ensure that private correspendence between two people is excluded.

Ben talked about the plans for the future of electronic records management in Scottish Government, including their intention to replace their existing EDRM within the next three or four years.  He speculated on whether it would be possible for one product/system to fulfill both their collabortion and records management needs, or whether Scottish Government would have to implement several different tools to deliver that vision.

This podcast is published as ECM Talk episode 014 – you can download it from here

G-cloud update

This Thursday I went to the tea cloud camp meeting on cloud computing held at the National Audit Office in London for an update on progress with the UK Government’s G-Cloud

Launch of Cloud Store

We heard that the UK Government’s CloudStore  could go online as early as this weekend.

CloudStore will be in effect a catalogue of suppliers who have been accredited by the UK government to provide the government with cloud services.  The accredited companies are grouped under four headings – Infrastructure as a service, Platform as a service, Software as a service (including applications such as EDRM, CRM and collaboration) and Services (including systems integrators).

This is a technology shift towards cloud solutions,   but more importantly this is a procurement revolution. It is a move towards transparent pricing, with suppliers stating their prices up front on the CloudStore, and pay as you go, easy- to enter and easy-to-leave contracts.

One IT manager told us that in her career as a civil servant she had managed so many contracts with poor suppliers (she used a stronger term than poor!).  Even though the contractors were not performing her department had no choice but to keep them because they had no plan B.  The penalty clauses for leaving the contract early were so great as to make it uneconomical to change, and the length of the procurement process meant that their were no alternatives lined up ready and waiting to step in and fill the gap left by the ousted supplier. For her CloudStore means always having a plan B.   If she has a poor supplier in future she looks at the cloud store, finds an alternative, terminates the contract with the poor supplier and starts one with an alternative provider.

She identified one of the key thing about CloudStore was that we will increasingly see IT applications bought as commodities rather than as bespoke solutions.

The benefits should work both ways. The public sector get a better price and the suppliers will benefit from the lower cost of winning business.  They will be able to strike a deal with new public sector customers much more quickly.  Another potential benefit for suppliers is that CloudStore will be viewable online by anyone.  I would not be surprised if people in other sectors and other countries looked at the UK Government’s cloud store to get an idea of what suppliers have been accredited by the UK Government, what services they offer and what prices they offer.  There is also the capability for public sector bodies to write Amazon style reviews of the service they have received.

One of the speakers mentioned how pleased she had been with the response from suppliers. Hundreds of applications were received when the CloudStore OJEU issued late last year, and companies that did not apply first time around will be given further opportunities in future to apply to get onto the store.

G-cloud pilot – a County Council puts its e-mail into the cloud

We heard from a county council who were putting their e-mail into the cloud, as a pilot G-Cloud project. They had received six bids – three from vendors offering public cloud services, three from private clouds.  They narrowed it down to three bids – Microsoft’s Office 365, Google Apps, and IBM (who offered Lotus notes from a private cloud).  Each bid provided the functionality they wanted so they went on price alone (which tells you that e-mail, calendaring and basic collaboration is now a commodity).  Google Apps won.

The Council picked an initial group of around 150 volunteers to trial Google Apps.  In order to avoid a self selecting sample of technology enthusiasts they asked volunteers to give a reason why they wanted to join trial, and picked people with a range of different motivations. The volunteers were not given face-to-face training, but were each set up on Yammer so that they could act as a support community for each other. They have only received four calls to the service desk since it started.

One of the first things they found was how quick it was to bring people onto the service. They bought some servers to use to migrate users from their existing system (Lotus notes e-mail hosted in-house) to Google Apps in the cloud.  The servers will not be needed as soon as the migrations have all taken place.  They had 15 users up and running on the service within a week of signing the deal.

They have resisted the temptation to bring the whole organisations over to Google Apps in one big bang. Running two systems alongside each other brings with it inconveniences around calendaring – some staff are using Lotus Notes calendars and some using Google Apps so it is difficult for them to share appointments etc.  Their initial volunteer group of 150 people had to be expanded to 250 simply because some of the volunteers had colleagues that they needed to be on the same calendaring system with.

The Council are going to look in the spring at integrating Google apps with their EDRMS so that it becomes easier for colleagues to save e-mails needed as records.  They may also start working with Google Sites at some point (which would bring the implementation into the filesharing /collaboration space).

G-cloud and security

The Council said one of the benefits of G-cloud for them was that they did not have to think through on their own and from scratch  the questions of security in the cloud and personal data in the cloud.   A lot of the thinking had been done centrally, on a public sector wide basis   (with the caveat that individual public sector bodies still have to assess the risks arising from their own information systems and make decisions appropriate to that level of risk).

CESG (the Government’s National Technical Authority for Information Assurance) is carrying out information assurance checks on every service that applies to join the CloudStore framework, as part of the accreditation process.

CESG has come up with a classification of business impact levels (here is the pdf)to enable public sector bodies to assess the impact of any particular type of information being compromised.    Business impact level 2 corresponds broadly to the government security classification of ‘protect’.  This is information that the government does not want to see in the public domain, but if it got in the wrong hands  the damage would be more inconvenient than disastrous.  Business impact level 3 corresponds broadly to the government security classification ‘Restricted’ – this is information where there could be serious consequences (to individuals, organisations, commercial interests or the nation as a whole) if the information got into the wrong hands.

For example personal data whose compromise is  unlikely to put an individual in danger is likely to be regarded as impact level 2, whereas personal data whose compromise could put an individual in danger is likely to be marked as Impact level 3 or above.  Impact level 2 covers vast swathes of government work.

Both Google Apps and Microsoft’s Office 365 have been accredited up to Impact level 2.  We were told that some of the vendors had started to show an interest in being able to offer a service accredited for impact level 3 information, but for at least the short term the CloudStore would not be catering for impact level 3 information.

One IT manager told us that the point of the cloud services is that it caters for the majority of government’s needs, not for all their needs. She said it may be that public  bodies simply made separate provision for restricted documentation and e-mail – even if it meant having separate booths dotted around the office with computers staff could use for  ‘restricted’ communications.

G-cloud, data protection, and the issue of storing data outside of the EU

One of the big concerns with cloud adoption has been the 8th data protection principle (present in the data protection legislation of every EU member state) which states that personal data should not be transferred outside the European Economic Area unless that country or territory ‘ensures an adequate level of protection for the rights and freedoms of data subjects in relation to the processing of personal data’.

There is also  a wider concern that where information is stored out of the country, and particularly when it is stored outside the EU, then it comes under a legal framework that the UK cannot control (for example many countries have legislation giving their governments powers of inspection, on security grounds, of information held in their territory).

The speakers at the meeting referred  to Cabinet Office Guidance on Government ICT offshoring. The guidance states that no information with a national security implication should be stored outside the country (whatever the impact level).

Personal data is a slightly different matter-  the Cabinet Office guidance does not forbid personal data being stored outside the EU, provided measures are in place to ensure that the contractor treats the data in an ‘adequate’ manner (‘adequate’ meaning compliant with EU data protection principles and practice), and provided the security in the system is appropriate to the impact level of the information.  The guidelines give three ways of ensuring that a contractor operating from overseas has an ‘adequate’  data protection regime – safe harbor, model clauses and binding corporate rules.

The safe harbor scheme was set up jointly by the EU and the US.  Individual US companies that sign up for the safe harbour scheme are considered ‘adequate’ by the EU and therefore the UK public sector is not contravening this principle by storing such data with these companies.   The safe harbor arrangement has been criticised by some commentators.  Chris Connolly said ‘The Safe Harbor is best described as an uneasy compromise between the comprehensive legislative approach adopted by European nations and the self–regulatory approach preferred by the US’.     However this article from The Register last month predicts that the safe harbour arrangement will survive the proposed forthcoming overhaul of EU data protection legislation.

The second of the measures is model contract clauses with companies to ensure that the company operates ‘adequate’ protections in relation to the data it stores under the contract.  The European Commission has drawn up some such clauses and the so has the UK Government.

Binding corporate rules are where the Government accepts that the internal policies of a company operating both within and outside the EU are strong enough to ensure that an ‘adequate’ data protection regime is operated across the whole company (and not just inside the EU).  The guidance states that such corporate rules are an alternative to model contract clauses and must be approved by a relevant data privacy supervisory authority ( the Information Commissioner in the UK, or an equivalent in another member state).

Why is content migration so difficult?

Migrating content from one application to another is a problem that even now, two decades into the digital age, we have no solution for.  Migrating content is often so labour intensive and complex as to be non cost effective.   Any content migration involves compromises and ommissions that result in a significant loss of quality of the metadata that is held about the content being migrated.
Solving the content migration problem is about to become more urgent with the growing popularity of the software-as-a-service (SaaS) variety of cloud computing.  In this model the provider not only provides the software application, they also host your content.  Imagine what would happens if your organisation decided it wanted to change from one SaaS provider to another.  It wants to change from Salesforce to a different SaaS CRM.  Or it wants to go from SharePoint online to Box or Huddle or another collaboration/filesharing offering (or vica-versa).   What do you do?  How do you migrate content from SharePoint Online to Box? They have little in common in terms of how they are architected and what entities they consist of. What is the Box equivalent of a SharePoint content type?
The vendor lock-in problem is very real.  If you can’t migrate the content you are left paying two sets of SaaS subscriptions and managing two SaaS contracts.  If you were leaving because of a breakdown of trust with your original SaaS providor then how happy would you be leaving your content locked up with them on their servers?

Content migration is a problem that affects all organisations, and which affects archivists as much as records managers

The difficulty organisations experience in migrating content from one application to another matters in many situations. It matters when an organisation wants to replace an application with a better application from a different supplier. It matters in a merger/acquisition scenario, when an organisation wants to move the acquired company onto the same applications that the rest of the group are using.
It matters to archivists, because any transfer of electronic records from an organisation to an archive is, to all intents and purposes, a content migration.  I heard the digital preservation consultant Philip Lord say at a conference that the big difference for archives of the electronic world over the paper world is that:

  • in the paper world it was possible for an archive to set up a routine process for transferring hard-copy records that it would expect all bodies contributing records to the archive to adhere to
  • in the electronic world everytime an archive wishes to accept a transfer of records from a new information system it needs to work out a bespoke process for importing that content and its metadata from that particular system into their electronic archive.

Different applications keep their metadata in profoundly different ways

Migration from one application to another is extremely time consuming because you are:

  • mapping from one set of entity types to another. Entities are the types of objects the application can hold (users/groups/documents/records/files/libraries/sites/retention rules etc)
  • mapping from one set of descriptive metadata fields to another
  • mapping from one set of functions to another. Functions are the actions that users can be permitted to perform on entities in the system (for example: create an entity/amend it/rename it/move it/copy it/delete it/attach a retention rule to it/grant or deny access permissions on it)
  • mapping from one set of roles to another. Roles are simply collections of functions, grouped together to make it easier to administrate them. For example in SharePoint the role of ‘member’ of a site collects together the functions a user needs to be able to access a site and view and download content, and to contribute new content to the site, but denies them the functions they would need to administer or change the site itself.

Let us imagine we want to migrate content from application A to application B. Application A has an export function and can export all of its content and metadata into an xml schema. That is good. We go to import the content and metadata into application B. This is where we hit problems.

Application B looks at the audit logs of application A. They contain a listing of events (actions performed by users on entities within the system, at a particular point in time). Each event listed gives you the identity of the user that performed the function, the name of the function they performed, the identity of the entity they performed it on and the date or time at which the event occurred. Application B won’t understand these event listings. It is unlikely to understand the identifiers application A uses to refer to the entities and users. It is unlikely to understand the functions performed because application A has a different set of functions to application B.

Application B looks at the access control lists of application A. Each entity in application A has an access control lists that tells you which users or groups can perform what role in relation to that entity. Application B does not understand those roles, nor does it understand the functions that the roles are made up of. Therefore system B cannot understand the access control lists.

The end result is that application B cannot understand the history of the entities it is importing from application A, and it cannot understand who should be able to access/contribute to/change them.  It is also going to find it difficult to import things like retention rules, descriptive metadata fields, controlled vocabularies.

Migration reduces the quality of metadata

The process of migration is ‘lossy’.  In the world of recorded music it is said that when you move music from one format to another (LP to tape, tape to mp3 etc.) you cannot gain quality, you can only lose it.  When you migrate content from one system to another you cannot gain information about that that content, you can only lose it.  There will be whole swathes of metadata in system A that it will not be cost effective for you to map to conterpart metadata in system B.  You end up migrating content without that metadata, and your knowledge about the content that you hold is poorer as a result.

The fact that content migration is so labour intensive and lossy means that many organsiations opt to leave content in the original application and start from scratch in the new application.  This is a nice easy option, but there are downsides.   It means that the organisation has to maintain the original application for as long as it needs to keep the content that is locked within it.  This means paying the resultant cost of licence fees and support arrangements.   It also means a break in the memory of the organisation.  Users of the new system wishing to look back over previous years will have to go to the old system to view the content.  That is OK for a short period, during which time most colleagues will remember the old system and how to use it.  But as time goes by a larger and larger percentage of colleagues will have no knowledge or memory of the older system and how to use it.

The organisation may mitigate the impact of that by connecting the search capability of system B to the repository of system A.  The results of this are hit and miss.  The search functionality of system B will have been calibrated to the architecture of system B, it will not be calibrated to the architecture of system A.  Yes, it will return results but it will not be able to rank them very well (and you are still having to maintain system A in order that system B can run the search on it).

What can electronic records management specifications do to improve this situation?

The problem of content migration is not specific to records systems, it is a universal problem that affects any organisation wishing to move content from any kind of application to another application.

But it is a problem that is central to the concerns of records managers and archivists, because as a profession(s) we are concerned with the ability to manage records over time, and difficulties in migrating content hamper our ability to manage content over time.  We know that applications have a shelf life – after a period of years a new application comes along that can do the same job better and/or cheaper, and therefore we want to move to the new tool.  The problem is that retention periods for business records are usually longer than the shelf life of applications.    Therefore it is probably from the records management or archives world that a solution will come to this problem, if it comes at all.

The first generation of electronic records management system specifications (everything from the US DoD 5015.2 that first came out in 1998 to MoReq2 which came out in 2008), did not attempt to tackle the problem. They told vendors what types of metadata to put into their products – but they did not tell vendors how to implement that metadata.   For example these specifications would specify that records had to have a unique system identifier, but it was up to the vendor what format that identifier took. They had to have a permissions model but what functions and roles they set up was up to the vendor, and so on.

This lack of prescription had the benefit of sparing vendors of existing products the necessity of re-architecting the way they assign identifiers/implement a permissions model/ keep event histories etc. Had existing vendors been forced to re-architect in such a way it would have proved a major disincentive for them to produce products that complied with the specification. But the disadvantage was that the electronic document and records management systems (EDRMS) that these specifications gave rise to each had their own permissions models and metadata structures. When an organisation wanted to change from one specification compliant EDRMS to another, they had the same content migration problems as you would when migrating content between instances of any other type of information system. An archive (for instance a national archive) wishing to accept records from different EDRM systems would need to come up with a bespoke migration procedure for each product.

MoReq2010’s attempt to facilitate content migration between MoReq2010 compliant systems

MoReq2010 marks something of a break with past electronic records management specifications.  One of its stated aims is to ensure that any compliant system can export its content together with their event history, their access control list and their contextual metadata, in a way that any system that has the capability of importing MoReq2010 content can understand and use.

In order to this it has had to be far more prescriptive than previous electronic records management specifications in terms of how products keep metadata.

For example

  • It tells any compliant system to give each implementation of that system a unique identifier. This means that any entity created within that implementation will be able to carry with it to subsequent systems information about the system it originated in
  • It tells every implementation of every compliant system to give each entity it creates the MoReq2010 identifier for that entity type, so that any subsequent  MoReq 2010 compliant system that the entity is migrated to understands what type of thing that entity is (is it a record? or an aggregation of records? or a classification class or a retention schedule? or a user? or a group? or a role?)
  • It tells every implementation of every compliant system to give every entity created within it a globally unique identity an identifier in a MoReq2010 specified format. Each entity can carry this identifier with it to any subsequent MoReq 2010 compliant system, no matter how many times it is migrated
  • It tells every implementation of every compliant system to give each entity an event history that not only records the functions performed on that entity whilst it is in the system, but which also could be carried on and added by each subsequent system.
  • It tells each compliant system to create an access control list for each entity in the system, that governs who can do what in relation to that entity whilst it is in the system, and which can be understood, used, and added to by any subsequent compliant system that the entity is migrated to.

To achieve the last two of these ambitions MoReq2010 had to get into the nitty gritty of how a system implements its permissions model.

MoReq2010 and permissions models

I recorded two podcasts with Jon Garde about the permissions model in MoReq2010:

  • episode 7 of Musing Over MoReq2010 is about how the ‘user and group service’ section of the MoReq2010 specification
  • episode 8 (shortly to be published here )is about the ‘model role service’ section – the part of the MoReq2010 specification that deals with functions (the actions users can perform within the system) and roles (collections of functions).

In the latter podcast Jon said that the model role service was the part of MoReq2010 that caused him the most sleepless nights when he wrote it.  The problem was that every product on the market already has a permissions model, with its own way of describing the functions that it allows its users to perform on entities within the system.

The dilemma for Jon writing Moreq2010 was as follows:

  • If the specification prescribed a way for each system to implement its permissions model then existing systems would have to be rewritten and this would act as a major disincentive for vendors to revise their products to comply with MoReq2010
  • If the specification did not prescribe a way for each system to describe the functions that users could perform within it then subsequent systems would not be able to understand the event histories of exported entities (because it would not understand which actions had been performed on the entity concerned) or their access control lists (because it would not understand what particular users/groups of users were entitled to do to that entity)

The solution that Jon adopted was half way between these two options.  In the model role service MoReq2010 outlines its own permissions model, with definitions of a complete set of functions that a record system can allow users to perform on entities.

MoReq2010 does not insist that to be compliant a system must implement every one (or even any one) of the functions that are outlined within the model role service.  It allows products to carry on using their own permissions model.  However MoReq2010 does insist that a system must be able to export their content and metadata with the functions and roles expressed as the functions and roles outlined in the MoReq2010 specification.  In other words a product would need to map its existing permissions model (functions and roles) to MoReq2010 functions and roles.   This would mean that two MoReq compliant systems with entirely different permissions models could both export their content with all of the functions in the access control lists and the event histories expressed as MoReq2010 functions.

Mapping the functions and roles in their product’s permission model to MoReq2010’s permission model is a significant body of work for vendors of existing systems, and they will obviously make a commercial judgement as to whether the benefit to them of achieving MoReq2010 compliance outweighs the cost of the investment they will need to make those mappings and to implement the other changes, such as the identifier formats, that MoReq2010 demands.

Because MoReq2010 is so prescriptive as to how systems keep metadata it could well be that it is easier for new entrants to the market to write new products from scratch to comply with the specification than it is for existing vendors to re-architect their products to comply. If I was a vendor writing a new document or records management system from scratch I would certainly think about simply implementing the MoReq2010 permissions model outlined in the model role service.

Why is import more complex than export?

The core modules of MoReq2010 include an export module.  Every compliant system must be able to export entities and their event histories, access control lists and contextual metadata in a MoReq2010 compliant way.   There is no import module in the core modules of MoReq2010.  Vendors can win MoReq2010 compliance for their products without their products being able to import content and its metadata from other MoReq2010 compliant systems.

The import module of MoReq2010 is being written as I write, and is scheduled for release sometime in 2012.  It will not be compulsory.  The reason why the import module is not a compulsory module of the specification is that not all records systems will need to import from other MoReq2010 compliant records systems.  For example by definition the first generation of compliant systems will not have to import from other compliant systems (because they have no predecessor compliant systems to import from!).

It will be more complex for a system to comply with the import requirements of MoReq2010 (when the module is published) than it is with the export requirements.

For example:

  • an existing product that seeks compliance with the core modules of MoReq2010 (but not the additional and optional import module) will have to map its functions (actions/permissions) and roles to the functions and roles outlined in MoReq2010.  It does not have to worry about all the functions listed in MoReq2010 – only the functions that it needs to map its own functions to
  • a product that seeks additionally to comply with the import module of  MoReq2010 compliant system will need to be able to implement all of the functions listed in MoReq2010 – because it needs to be able to import content from any MoReq2010 compliant system and a MoReq2010 compliant system may chose to use any of the functions listed in MoReq2010.

I put it to Jon in our podcast on the model role service that we would know that MoReq2010 had ‘arrived’ if and when someone brings to market a product that complies with the import module and is capable of importing content from MoReq2010 compliant systems.  Once you have products capable of importing from MoReq2010 compliant systems there is all of a sudden a purpose to implementing MoReq2010 compliant systems – the theoretical possibility of being able to pass content onto another system that understands the content as well or nearly as well as the originating system is turned into a practical reality.  Once you have a product that is capable of importing from MoReq2010 compliant systems it is in the interests of anyone implementing that product to influence whoever runs the applications that they wish to import from to make those applications MoReq2010 compliant. Imagine a national archives running an electronic archive with a MoReq2010 import capability.  It would be in the interests of that national archives to pursuade the various parts of government who contribute records to them to implement MoReq2010 compliant systems.

Jon’s response on the podcast was to lay down a challenge to the archives world to develop a MoReq2010 compliant electronic archive system, with a MoReq2010 compliant import capability.

What are the chances of MoReq2010 catching on?

MoReq2010 is doubly ambitious.  In this post I have looked at its ambition to ensure that content can take its identifiers, event history, access control list and contextual metadata with it through its life as it migrates from one system to another.  Its other great ambition is to reach a situation where any application in use in a business is routinely expected to have record keeping functionality.   The two ambitions are related to each other.

  • MoReq2010 makes it feasible for the vendor of a line of business system to add records management functionality to their product and get it certified as being a compliant records system. The specification has done this by eliminating from the core modules  any requirements that are would not be necessary for every system to perform however small and however specialised. A compliant system does not have to be able to do all the things an organisation-wide electronic records management system would have to do.  It only needs to be able to manage and export its own records. Note that MoReq2010 makes it possible for vendors of line of business systems to seek compliance, but the specification alone cannot incentivise them to do this – incentivisation would have to come from the market or from organisations that could influence the market
  • Because MoReq2010 allows the possibility for  records to be kept in multiple line of business and other systems within an organisation then the issue of migration becomes very important.  When a line of business applicatin is replaced the organisation will need to migrate content  either to the application’s replacement or to an organisational records repository or or to a third party archive. Hence the ambition that any compliant records system can export content and metadata in a way that another compliant system can understand.

Being ambitious carries with it a risk.  MoReq2010 does call for existing vendors to re-architect its systems, and vendors do not like re-architecting their systems.  If too few vendors produce products that comply with the specification then MoReq2010 will go the way of its predecessor, MoReq2, which died because only one vendor felt it was commercially worthwhile to produce a product that complied with it.

In the situation that electronic records management finds itself in, being ambitious is less risky than trying to incrementally tweak previous specifications.   MoReq2, failed because by the time it was published in 2008 the bottom had fallen out of the market for the EDRM systems that it and previous electronic records management system specifications underpinned.  SharePoint had come along and pushed it over like a house of cards.

EDRM fell without so much as a whimper because no-one was prepared to defend it.  Archivists were not prepared to defend it because they had not benefited from it – it was as hard for them to accept electronic transfers from EDRM systems as from any other type of application.  Practioners were not prepared to defend it because it had proved difficult and expensive to implement monolithic EDRM systems across whole enterprises.  The ECM vendors who had acquired EDRM products were not prepared to defend it because EDRM represented only a relatively small portion of their portfolio, and they had no stomach for a fight with Microsoft.

MoReq2010 has a chance of success.  It is not guaranteed to succeed, but it has a chance.  The reason why it has a chance is because it is addressing the right two questions – how do we get records management functionality adopted by all business applications? and how do we ensure that content can be migrated easily and without significant loss of metadata from one application to another?

These questions will have to be nailed. If MoReq2010 succeeds in nailing them so much the better.  If it doesn’t, if the market isn’t ready for it, then whatever specifications come after it will have to nail them.  There is no going back to the EDRM ‘one records system-per-organisation’ model.

Talking records – podcast discussion with Christian Walker

In this podcastChristian Walker and I discuss whether records management is compatible with enterprise 2.0.  We talk about the problems of capturing records into a records systems such as an EDRMS.  We ponder on whether anyone could or should integrate their EDRM with a web service such as Twitter or Facebook.

I express mixed feelings about the concept of asking users to declare things as a record. (Chris wrote a blogpost ‘records matter, declaration doesn’t‘ last year, with a more recent follow up).

We discuss whether text analytics could be used to automatically select which e-mails should be saved as a record. We conclude that you probably could in isolated areas of your business, that you studied in depth and trained the analytic engine in, but that it would be difficult if not impossible to scale it up over all the activities of an organisation.

We discuss the challenges of using the word ‘record’ given that when anyone uses it you don’t know whether they mean one document or a collection (large or small) of documents. Chris wonders whether it is viable to carry on using the word ‘records’ but neither he nor I could come up with an alternatiive.

We end up talking about the proposed (but postponed) SOPA (Stop Online Piracy Act). Chris opposes the idea that a platform such as a filesharing site should be closed down if some of its users contribute content that infringes intellectual property rights.  He says it is up to content owners to protect their content online. He calls the media backers of SOPA ‘dinosaurs’.  I recall that we records managers get called dinosaurs too and I try to draw parallels.  The new media companies of Silicon valley (Facebook, Google, Twitter) are interested in the platform rather than the content.  The old media companies (Disney, News International) and records managers are interested in the content rather than the platform. The thing that records managers and old media have in common is that sometimes we seem to be swimming against the times.

Chris blogs at http://christianpwalker.wordpress.com/   and tweets as @chris_p_walker

Click here to play the podcast:
http://traffic.libsyn.com/talkingrecords/RecordsManagement_and_E2.0.mp3

MoReq2010 update

The DLM Forum held their triannual conference in Brussels last month.  The conference brought together archivists and records managers from across Europe.  The DLM forum had earlier in the year published the MoReq2010 electronic records management system specification, and there was much talk of the specification at the conference.

The DLM forum released the first sets of test scripts to vendors immediately prior to the conference. This is a significant point in the life of an electronic records management system specification. It means that vendors get to see exactly how their products will be judged by test centres.  This gives them a solid basis for deciding whether or not it will be worth their while modifying their products to comply with the specification.  It means they can get down in ernest to the work of preparing products to comply with the specification.

I have two predictions to make about MoReq2010.  Firstly that it will be a slow burner, secondly that it will end up being the most influential of the world’s electronic records management specifications.

It will take some time before the exact nature of MoReq2010 compliant products becomes apparent.   It is likely that MoReq2010 will lead to a very heterogenous set of products, ranging from products that simply manage records held in one type of application (products that simply manage records held in SharePoint, products that simply manage records held in an e-mail system) to products that can manage records held in any application that the organisation uses.  This is in contrast to the previous generation of electronic records management specifications (from DoD 5015.2 to MoReq2) that led to a very homogenous set of products – namely those products dubbed ‘electronic records management systems’ (EDRMS).

It was interesting to hear Jon Garde say at the conference that he hoped that the acronym that comes to be applied to systems that comply with MoReq2010 is ‘MCRS’ (MoReq2010 compliant record system) rather than ‘EDRMS’.

The second reason why MoReq2010 will be a slow burner is that it is the first specification which has been designed to be added to as it goes along.  One of the key learnings from the last seven years has been the realisation that the digital world is constantly creating new formats. Although we as records managers are primarily interested in the content and context of records rather than their format, we have to acknowledge that different formats have different management requirements.  The old paradigm of documents being aggregated into files is hard enough to apply to e-mail, let alone blogposts, status updates, discussion board posts and wiki pages.  E-mails tend naturally to aggregate themselves into e-mail accounts, blogposts aggregate themselves into blogs, status updates aggregate themselves into streams, wiki pages aggregate themselves into wikis.

MoReq 2010 takes a more generic approach. In the core requirements (the requirements that any MCRS has to adhere to) it talks in a rather abstract fashion of the system being able to manage ‘records’ that are grouped into ‘aggregations’ and which receive their retention rule from a records classification.   This begs the questions – what formats of records can any specific MCRS manage? How will it aggregate them?

The core requirements of MoReq2010 will be supplemented by extension modules.  At the DLM forum conference it was announced that extension modules were being written for specific types of record formats (for example e-mails), and for particular types of aggregation (for example the traditional ‘file’).  Buyers will be able to see what record formats and what types of aggregations a particular MCRS will be able to manage by looking at which extension modules the particular MCRS complies with.

In theory extension modules could be written for any and every format that came along and that had specific management requirements.  In practice this is likely to depend on the  capacity of the DLM forum to produce such extension modules.  Whereas the core requirements of MoReq2010 were substantially the work of one man (Jon Garde, with help from colleagues such as Richard Blake) it is hoped that a much wider base of people will contribute to the writing of extension modules. At the conference Jon appealed to those assembled to step forward and volunteer to help in the writing of these modules.

Jon Garde predicted an explosion in MoReq2010 over the next 12 months, as both MoReq2010 compliant products, and MoReq2010 extension modules started to appear.

As I write this post the most influential electronic records management specification in the world is the US DoD 5015.2.   That specification is looking increasingly jaded and outdated.  It was last revised in 2007, and the latest version does not reflect the changed nature of the digital landscape in organisations since the rise of both social computing and of SharePoint.   Over the medium term MoReq 2010 will overtake DoD 5015.2 in importance provided only that vendors find it feasible and profitable to develop products that comply with it.