Why a link between MoReq2010 and the OAIS model would benefit both records managers and archivists

The dream of a single record keeping profession

It is roughly twenty years since Frank Upward began popularising the records continuum as a paradigm shift away from the previously prevalent records lifecycle model. It was the early 1990s, the digital revolution was about to hit organisations and Upward did not believe that a body of professional thought based on the lifecycle paradigm would cope with it.

Upward had both philosophical and practical concerns about the lifecycle model.

His philosophical concerns stemmed from the fact that the records lifecycle model depicted records as moving in a straight line through time: from creation of the record, through an active phase (where it is being added to and used); a non-active phase (where it is kept for administrative and/or other reasons); until final disposition (destruction, or transfer to an archive for long-term preservation). Upward pointed out that whereas Isaac Newton believed time moved in a straight line like an arrow, Einstein had proved that time and space were inseparable and that both were warped by the speed of light.

Upward compared records to light. Light carries information about an event through time and space. So do records. Upward based his records continuum diagram on a space-time diagram. The space time diagram depicts the way that light travels in every direction away from an event through space and through time. Noone can know about an event unless and until the light has reached them. The records continuum diagram showed that records need to be managed along several different dimensions in order to function as evidence of that event across time and space, and in order to reach different audiences interested in that event for different reasons, at different times and in different places.

Upward’s practical concern related to the fact that the lifecycle model had been used to underpin a distinction between the role of the records manager and of the archivist. The records manager looked after records whilst they were needed administratively by a creating organisation, the archivist looked after records once they were no longer needed administratively but still retained value to wider society.

The interest of wider society in the records of any particular event do not suddenly materialise 20 or 30 years after an event. The interest of society is present before the event even happens. A records system of some sort needs to be in place before the event happens in order for the participants/observers of the event to be able to capture a record of it. That system needs to take into account the interest of wider society in the event in order for the records to have a fighting chance of reaching interesting parties from wider society if and when they have the right to access them. This concern is particularly pertinent to digital records. Whereas paper records could left in benign neglect, digital records are at risk of loss if they, and the applications that they are held within, are not actively maintained.

Upward didn’t use the word archivist or the word records manager. To him we are all record keepers. What we do is records keeping, and we belong to the record keeping profession.

One of the big impacts of the digital revolution, and of the paradigm shift from the lifecycle model, was the shift of attention away from the form and content of records themselves, and towards the configuration of records systems. In this post I will compare the way records managers have gone about the business of specifying records systems with the way archivists have gone about defining digital archives.

The continuing divide between records managers and archivists

Upward’s plea for a united recordkeeping profession has gone largely unheeded in the English speaking world. Twenty years into the digital age we still see a profound cleavage between not just the roles of archivists and records managers inside organisations, but also their ambitions and strategies with regard to electronic records.

The DLM forum is a European group that brings together archivists (mainly from the various national archives around Europe) and records managers. When you are listening to a talk at a DLM event you can always tell the records managers and the archivists apart:

The records managers refer to MoReq2010 (developed by the DLM Forum itself ) – a specification of the functionality that an application needs in order to manage the records it holds
The archivists talk about OAIS (open archival information information systems) a standard for ensuring that a digital repository can ingest, preserve, and provide renditions of electronic records that have been transferred to the archive

This reflects a difference in the initiatives that the two branches of the profession have adopted towards electronic records

Records management initiatives

The records management profession have attempted to design records management functionality into the systems used by the end-users who create and capture records. Over the period 2000 to around 2008 this strategy was mainly centred around specifying and implementing large scale corporate wide electronic document and records management systems (EDRMS) Unfortunately a relatively small percentage of organisations succeeded in deploying such systems, and even those that did still found that many records were kept in other business applications and never found their way into the EDRMS.

We are now in the early days of the development of alternative records management models. MoReq2010 is the most recent attempt to influence the market and specify the functionality of electronic records management systems. MoReq2010 is framed to support several different models. It continues to support the EDRM model, but it also supports the following alternative models:

building in records management functionality into business applications (and storing records in those applications) (in-place records management)
storing records in the business applications into which they were first captured, but managing and governing them from a central place (federated records management)
integrating the many and various business applications in the organisation into a central repository which stores, manages and governs records.

The key thing that all three of these newer approaches have in common is that they each involve records passing from one system to another during their lifetime. For example in the in-place model even if an organisation suceeded in installing records management functionality into every one of their applications (a distant hope!), it would still need to make provision for what would happen to the records held by an application that it wished to replace. For that reason MoReq2010 pays particular attention to ensuring that applications can export records with their metadata in a way that other applications can understand and use.

Archival initiatives

The strategy of archivists has been to design digital archiving systems which can capture records from whichever system(s) a record creating organisation deploys. In theory it would not matter what applications an organisation (government department/agency etc.) used to conduct their business, and it would not matter whether or not that application had records management functionality. The archives would still be able to accession records from them provided that they succeed in:

building a digital archive that can receive accessions of electronic records and their associated metadata
defining a standard export schema/metadata schema dictacting exactly how what metadata needs to be provided, and in what form, about records transfered to the archives
enforcing that export schema/metadata schema so that all new transfers of electronic records come in a standard form that is relatively straightforward for the archive to accession into their digital repository

Unfortunately only a small number of national archives have succeeded in making electronic accessions of records into anything remotely resembling a routine. Some archives have succeeded in building digital archive repositories, The UK National Archives has got one, so has Denmark, the US, Japan and others. But the process of accepting transfers of electronic records into these archives is problematic. Every different vendor sets up their document management systems to keep metadata about the content their applications hold in a different way. The first time the archives accepts a deposit of records from each different system there is a lot of work to do translating the metadata output from that system to a format acceptable to the digital archive repository. The resources required to do this work either has to be provided by the Archives, or by the contributing body.

Jon Garde summed the situation up when he said in a talk to the May 2012 DLM Forum members’ meeting that ‘most records never leave their system of origin’ This comment serves as a sorry testament to the success of the initiatives undertaken by both sides of our profession hitherto.

Lack of a join between records management and archival initiatives

It is rare to see examples of a joined up strategy between records management and archival initiatives. In the United Kingdom the National Archives (then the Pubic Record Office) started out in the early years of the digital age by taking a great interest in the way government departments managed their electronic records, in the hope that this would make it easier for the Archives to accept electronic records from those departments. The records management arm of TNA defined the UK’s electronic records management system specification. Between 2002 and around 2008 they supported government departments in implementing the EDRM systems that complied with those specifications. But the archivists at TNA derived little benefit from this.

The TNA issued both versions of its electronic records management specification without a metadata standard and without an xml export schema. This meant that each compliant EDRM system from each different vendor kept metadata about records in a different way, and hence the transfer of records from those EDRM systems to the National Archives would need to be thought through afresh for each product. By the time the National Archives did get round to issuing a metadata standard they had already made the decision to stop testing and certifying systems (in favour of the Europe-wide MoReq standard). The abscence of a testing regime meant that vendors had no incentive to implement it into their products. But even if vendors had implemented their metadata standard, the TNA would have benefited little from it on the archive side. This is because TNA decided not to use that metadata standard for their own digital archive repository.

The OAIS model

The OAIS model is a conceptual model of what attributes a digital archive system should possess. It makes clear one of the key differences between a traditional archive of hard copy/analogue objects, and the digital archive.

A hard copy archive, in the main, produces to the requestor the very same object that they have been storing in the archive, which in turn is the very same object that was transferred to the archive by the depositing organisation.

In a digital archive this does not hold true. The object originally transferred to the archive may need to be changed or migrated to new software or hardware, so the digital object actually stored in the archive will differ from the digital object originally submitted to it . At the point in time when a requestor asks to see the record the digital archive will usually make a presentation copy for them to view, rather than providing them with the object that they store in the repository. The object that they provide to the requestor may differ in some respects from the object stored in the archive, for example if the archive wishes to present a version of the object that is better adapted to the browser/software/hardware available to the requestor.

The OAIS model came up with a vocabulary to describe these three seperate versions of the record:

The object originally transferred to the archive is a submission information package (SIP)
The object stored in the archive is an archival information package (AIP)
the object provided to the requestor is a dissemination information package (DIP)

OAIS has no certification regime, so there is no way for proprietary products, open source products or actual implementations to be certified as compliant with the model. At various times the archives/digital preservation community has debated whether or not it should have a certification regime (see this report of an OAIS workshop run by the Digital Preservation Coalition). Some archivists have felt that it is is an advantage that OAIS does not have a certification regime, because it allows vendors and organisations the flexibility to implement the model in different ways. Others have felt that the lack of a certification regime hinders interoperability between archives.

An example of the OAIS model way working well – The Danish National Archives

I had a tour of the Danish National Archives on 31 May 20102 (the first day of the members meeting of the DLM Forum). The Danish National Archives has a very well functioning process based on the OAIS model. They have laid down a clear standard for the format in which Danish government bodies transfer records plus their metadata (submission information packages) to the Archives. Government bodies send records on optical disk or hard drives. The archives gives each accession a unique reference. Then it tests the accession to ensure it conforms to the standard. The testing is performed on a stand alone testing computer. Each accession is called ‘a database’, because the accession always comes in the form of a relational database. Such relational databases typically hold metadata together with the documents/content that the metadata refers to.

I asked whether a government department could deposit a shared drive with the archive. They replied that the department would have to import the shared drive into a relational database first in order to format the metadata needed for the accession. This brought home to me the fact that when an archive imposes a standard import model it does not reduce the cost of transferring records from many and various different systems used by organisations to one digital archive. It merely places a greater proportion of the cost of the migration on the shoulders of the transferring bodies.

It is not necessarily easy for other national archives to replicate the success of the Danish National Archives. An archivist from the Republic of Ireland accompanied me on my tour of the Danish National Archive. He is in charge of electronic records at the archives of the Republic of Ireland. The Irish archives have not been able to get a standard format agreed for government departments to send accessions of electronic records to them. From time to time government bodies send accessions of electronic records, principally when a government body is wound down. The archives can do nothing more than store the accessions on servers and make duplicate copies. They have no digital archive repository to import them into. Even if they did have an archive repository the fact that the accessions are in such different formats means that the process of ingesting the accessions into such a repository would be an extremely time consuming and lossy process. The chances of the archives persuading the rest of the Irish government to accept a standard format and process for transferring electronic records are slim because in times of austerity it would be seen as an extra administrative burden.

(For more details on the approach of the Danish National Archive watch this 25 minute presentation by Jan Dalsten Sorensen)

An example of the records management approach working well – the European Commission

The European Commission has taken a records management approach to managing records from their creation until their disposal or permanent preservation.

They started off with a fairly standard electronic document and records management system (EDRMS) implementation with a corporate file plan, and later with linked retention rules. But then they expanded on this model. They are currently in the process of integrating one-by-one their line-of-business document management systems into the EDRM repository. The ultimate aim is that a member of staff could choose to upload a record into anyone of the Commission’s document management tools and still have the record captured in a file governed by the Commisssion’s filing plan and retention rules. They are also developing a preservation model for the historical archives. This module will enable records to pass from control of the Directorates-General (DG) of the Commission that created them, into the control of the Historical Archives without leaving the EDRM repository itself.

The model is not perfect (like every other organisation they find it difficult to persuade colleagues to contribute e-mail to the EDRMS), and it is not finished (not all the different document management systems have been integrated yet, not all the functionality needed to manage the process of sending records to the control of the Historical Archives has been added yet), but it is a very well thought through and solid approach, that has successfully scaled up to cover nearly 30,000 people.

As with the Danish National Archives, it would not be easy for other organisations to replicate the success of the European Commsision’s approach.The Commission’s success has come as a result of a records management programme that was started in 2002, it has taken a considerable amount of time (ten years) and a considerable amount of political will to draft the policies, build the filing plan, draft the retention schedule, establish the EDRM, and to commence the integration of other document management systems into the EDRM. The integration of each document management system into the EDRM is a new project each time, requiring developers to work on the document management system in question in order that it can use the EDRM systems object model to deposit records into the respository.

In these turbulent times of economic austerity it is hard to envisage many organisations embarking on a records management programme that would take 6 to 8 years to deliver benefits.

How do we make it more feasible to manage records over their whole lifecycle?

The facts that these two excellent examples, from the Danish National Archives and the European Commisson are so difficult to replicate is a concern for both the records management and archives professions.

In an ideal world every records management service would operate a records repository, every archive would run a digital archive. In an ideal world the records managers would not need to get developers to do any coding to enable business applications to export their records into the records repository – the applications would be configured so that they could export records and all accompanying metadata in a way that the repository understood.

In an ideal world an Archive running a digital archive would not have to specify to their contributing bodies that they need to tailor and adjust the exports of their application. In an ideal world those bodies could run a standard export from any of its applications, that the Archive could import, understand and use.

The key enabler for both of these things is a widely accepted standard on the way metadata on things like users, groups, permissions, roles, identifiers, retention rules, containers and classifications are kept within applications, coupled with a standard export schema for the export of such metadata. If such a standard schema existed then a records repository owner or digital archive owner could specify to the owners of applications that needed to contribute records to the repository/digital archive that they either:

 implement applications that keep and records and associated metadata in that standard format, OR
  implement applications that can export metadata in the standard export format , even if the metadata within the application had been kept in a different way
develop the capability to transform exports from any of their applications into the standard export schema. This last point should be helped by the fact that any widely accepted export schema would lead to the growth of an ecosystem of suppliers with expertise in converting exports of records and metadata into that format. Indeed such a format could become a ‘lingua franca’ between different applications.

The opportunity for a link between MoReq2010 and the OAIS model

The only candidate for such a standard export format at the moment is the MoReq2010 export format, published by the DLM Forum. The DLM forum comprises both archivists and records managers, but most of the archivists have hitherto taken relatively little interest in MoReq2010. On June 1 this year (the day after our visit to the Danish National Archives) I gave a presentation to the DLM members forum meeting suggesting that the archival community should develop an extension model for MoReq2010, such that any system compliant with that module would also have the functionality necessary to operate in accordance with the OAIS model.

This would have a number of beneficial effects. For the first time in the digital age we would have a co-ordinated specification of the functionality required to manage records at all stages of their lifecycle including managing archival records.

It would also be a huge boost for MoReq2010. The first two products to be tested against MoReq2010 will be SharePoint plug-ins – one produced by GimmalSoft, one by Automated intelligence. Let us assume that both products pass and are certified as compliant. Both products will be able to manage records within the SharePoint implementation that they are linked to. Both will be able to export records in a MoReq2010 compliant format. But there still won’t exist in the world a system capable of routinely importing the records that they produce. This is because the import features of the MoReq2010 specification are not part of the compulsory core modules of MoReq2010 – instead they are shortly to be published as a voluntary extension module.

Let us imagine that a National Archive somewhere in the world deploys a digital archive, that complies with the OAIS model, and that can import records exported from any MoReq2010 compliant system. All of a sudden there is a real incentive for that archive to influence the organisations that supply records to it to deploy MoReq compliant applications (or applications that can export in the MoReq2010 export schema, or MoReq2010 compliant records repositories). It works the other way around too. Let us imagine there is a country somewhere whose various government departments deploy MoReq2010 compliant applications. All of a sudden there is an incentive for their National Archives to deploy a digital archive that is compliant with the import module of MoReq2010 and can therefore routinely import records and metadata exported from those MoReq2010 compliant applications.

Debate at the DLM members forum on an OAIS compliant extension module for MoReq2010

The suggestion of an OAIS compliant extension module for MoReq2010 sparked off an interesting debate at the May DLM forum members meeting. Tim Callister from The National Archives (TNA) in the UK. and Lucia Stefan, both criticised that OAIS model. They said it was designed for the needs of a very specialised sector (the space industry, with their unique formats and data types) and was not tailored for the needs of national archives who are largely tasked with importing documents in a small range of very well understood file formats (.doc, .pdf etc.). Jan Dalsten Sorensen from the Danish National Archives defended OASIS, saying that it had given archivists a common language and common set of concepts with which to design and discuss digital archives.

I said that any digital archives extension module for MoReq2010 should be compatible with OAIS – if only because otherwise it would lose those archives (like the Danish National Archives) who had invested in that model. It would also lose the connection with all the thinking and writing about digital archives that has utilised the concepts of the OAIS model

After the debate I spoke to an archivist from the Estonian national archive. He said that his archive didn’t want lots of metadata with the records that they accession. I said that was because the more metadata fields that they specified in their transfer format the greater the amount of work that either they or the contributing government department would have to do to get the metadata into the format needed for accessions. If their contributing government departments had systems that could export MoReq2010 compliant metadata, and if the digital archive could import from the MoReq2010 export schema, then they wouldn’t need to be pick and choose the metadata – they could take the lot.

Posted on July 13, 2012July 13, 2012 by James Lappin 0

Information assurance and encryption

Alison Gibney spoke to the June meeting of the IRMS London Group about the relationship between the disciplines of information assurance and records management.

Alison differentiated information assurance from information security. Information security covers any type of information an organisation wants to protect, whereas information assurance is focused on protecting personal data. There is no US equivalent term, partly because the US legislation on personal data is less strict than that of the UK.

Alison said that in the UK public sector over the last five years records management has gone down a peg or two (thanks to budget cuts) , whilst ‘information assurance’ has gone up a few pegs. The rise of information assurance is thanks to the various high profile central government data leaks in 2007 and 2008; the Hannigan report which compelled UK government bodies to adopt a strong information assurance regime in response to those leaks; and the fines meted out by the Information Commissioner for non-compliance with the Data Protection Act.

Alison showed a list of the fines meted out by the Information Commissioner over the past few years. She pointed out that a significant proportion of the fines had gone to local authorities. This was not necessarily because local authorities are worse at managing personal data than, say, a private sector retail company. It is more likely to be because local authority have literally hundreds of different functions that necessitate the collection, storing and sharing of personal data, whereas a retail operation may only have three or four such functions. It is very hard for a local authority to ensure that all these many different functions are fully compliant with data protection legislation.

Alison also pointed out that most of the fines could be ascribed to two generic types of breaches:

communications being sent to the wrong person
loss or theft of removable media

These two types of generic breaches both occurred across a range of formats, digital and hard copy. Communication misdirections included misdirected e-mail, letters and faxes. Loss and thefts of removable media included losses of laptops, key drives and hard copy files.

When to encrypt and when not to encrypt

Alison said that encryption offered a solution to the problem of protecting personal data in certain circumstances.
Alison recommended encrypting:

personal data in transit (for example data being e-mailed to a third party) because of the risk of interception or misdirection
personal data on removable media (optical disks, laptops, mobile phones etc) because of the risks of loss or theft

Alison did not recommend encrypting ‘data at the rest’ in the organisation’s databases/document management systems. But she raised a question mark over data in the cloud. Technically it is data at rest. But it is data held by another organisation, possibly within a different legal jurisdiction, and the organisation may wish to encrypt because of that.

It is worth taking a closer look at the issues around the encryption of the different types of data that Alison mentioned.

Encrypting data in transit – e-mail

There are various ways of encrypting e-mail. A typical work e-mail travels from a device, through an e-mail server within the organisation, to an e-mail server in the recipient’s organsiation, to the device of the recipient. There are security vulnerables at any of those points. Devices bring in a particular vulnerability particularly since the rise in usage of smartphones and of trends such as ‘bring your own device’.

The most secure option is for the message to be encrypted all along the chain. However the further along the chain the e-mail is encrypted the more complex, expensive and intrusive the encryption software and the procedures for applying it become. Chapter 4 of this pdf hosted by Symantec gives a neat summary of the different options for e-mail encryption.

If you decide to encrypt messages from the moment they leave the sender’s device all the way to the recipient’s device (endpoint to endpoint) then both the sender and the recipient will need encryption software installed on their device. This is intrusive for both parties. It may be reasonable to expect organisations that you regularly exchange sensitive data with to have such software installed. It would not be reasonable for a local authority corresponding with a citizen for the first time to expect the citizen to install such software.

A less intrusive option is ‘gateway to gateway’ encryption. The message goes in plaintext from the sender’s device to a gateway server inside their organisation. The gateway server encrypts it. When it reaches the recipient organisation it is decrypted by a gateway server which sends it on in plaintext to the recipient. Note that this model requires the recipient organisation to have the same encryption software installed on their gateway server as is used by the sending organisation.

A lighter approach to encryption is the gateway-to-web approach where a standard plaintext e-mail is sent to a recipient giving them a link to a web address where they can go to retrieve the message which is protected by some kind of transport layer encryption such as SSL (Secure sockets layer – as used to protect credit card transactions on the web). The web site will ask the user for authentication. Assuming that the authentication details have been sent to the user by a different channel other than e-mail, this will provide some protection against an e-mail being sent to the wrong recipient. In her talk Alison had informed us that 17 London boroughs had chosen encryption software from one particular vendor (Egress) – that uses this gateway- to-web model. Although this is a lower level of security than endpoint to endpoint encryption, it has the crucial advantage that the recipient does not need to install encryption software.

Encrypting data on removable media

Encrypting personal data on removable devices is the most straightforward of the cases that Alison mentioned. The individual who owns the device it with their (public) encryption key. Only they have the (private) encryption key necessary to decrypt it. If the device gets lost or stolen the data is safe so long as the person who gets it does not get the decryption key.

Encryption and data-at-rest

Bruce Schneier points out in his post Data at Rest vs Data in Motion (http://www.schneier.com/blog/archives/2010/06/data_at_rest_vs.html) that cryptography was developed to protect data-in-motion (military communications), not data-at-rest. If an organisation encrypts the data it holds on its on-premise systems then it faces the challenge of maintaining through time the encryption keys necessary to decrypt the data. Schneier puts it succintly: ‘Any encryption keys must exist as long as the encrypted data exists. And storing those keys becomes as important as storing the unencrypted data was’ . If you are encrypting the data to guard against an unauthorised person overcoming the system’s security model and gaining access to the data, then logically you cannot store the decryption keys within the system itself (if you did you would not have guarded against that risk!).

Encryption and data in the cloud

Most organisations have regarded data at rest within their on-premise information systems as being significantly less at risk than data on removable devices and data in transit, and therefore do not encrypt it.

But what about data in the cloud? There are certain features of cloud storage that may lead your organisation to wish to encrypt its data. You may be concerned that a change of ownership or a change of management at your cloud provider might adversely impact on security. You might be worried that the government of a territory in which the information is held may attempt to view the data. You may be concerned about the employees of the cloud provider seeing the data. However fundamentally speaking, data in the cloud is data-at-rest. It is data that is being stored not communicated. . If you encrypt your cloud data you have the same problem as you would have if you encrypted data on your on-premise systems – how do you ensure that you maintain the encryption keys over time, and where do you store them? If your reason for encrypting is a lack of trust of the cloud provider then storing the decryption keys with the cloud provider would defeat the object.

The challenge of encryption key management

Key management is crucial to any encryption model. In theory you want the organisation to give every individual a private/public encryption key pair. That means that not only can individuals encrypt information (with their public key) that they wish to keep secret. But you also can provide a digital signature capability because they can encrypt with their private key a signature that anyone can read with that individual’s public key. This offers proof that the individual alone must have signed it because it could only have been encrypted with the individual’s private key. Thus the signature is in theory non-repudiatable (the indivdual could not deny that it was they that sent it).

There are a couple of interesting issues around key management. Do you make it so that only the individual knows their private key? – in which case the digital signature is genuinely non-repudiatable. But if the individual loses their private key then they lose access to their signature and to information that they have encrypted. The alternative is that the organisation provides a way for the individual to recover their private key, for example through an administrator. But this means that the digital signature is in theory now repudiatable because the administrator could have used it to sign a message.

I am currently reading a wonderful book that explains why even two decades into the digital age there is still no widely accepted digital signature capability – the vast majority of organisations do not use digital signatures (and hence still have a need to keep some paper records of traditional blue ink signatures) The book is called burdens of proof and is written by Jean-Francois Blanchette.

Posted on July 4, 2012July 7, 2012 by James Lappin 0

Thinking Records

James Lappin's records management blog