G-cloud update

This Thursday I went to the tea cloud camp meeting on cloud computing held at the National Audit Office in London for an update on progress with the UK Government’s G-Cloud

Launch of Cloud Store

We heard that the UK Government’s CloudStore could go online as early as this weekend.

CloudStore will be in effect a catalogue of suppliers who have been accredited by the UK government to provide the government with cloud services. The accredited companies are grouped under four headings – Infrastructure as a service, Platform as a service, Software as a service (including applications such as EDRM, CRM and collaboration) and Services (including systems integrators).

This is a technology shift towards cloud solutions, but more importantly this is a procurement revolution. It is a move towards transparent pricing, with suppliers stating their prices up front on the CloudStore, and pay as you go, easy- to enter and easy-to-leave contracts.

One IT manager told us that in her career as a civil servant she had managed so many contracts with poor suppliers (she used a stronger term than poor!). Even though the contractors were not performing her department had no choice but to keep them because they had no plan B. The penalty clauses for leaving the contract early were so great as to make it uneconomical to change, and the length of the procurement process meant that their were no alternatives lined up ready and waiting to step in and fill the gap left by the ousted supplier. For her CloudStore means always having a plan B. If she has a poor supplier in future she looks at the cloud store, finds an alternative, terminates the contract with the poor supplier and starts one with an alternative provider.

She identified one of the key thing about CloudStore was that we will increasingly see IT applications bought as commodities rather than as bespoke solutions.

The benefits should work both ways. The public sector get a better price and the suppliers will benefit from the lower cost of winning business. They will be able to strike a deal with new public sector customers much more quickly. Another potential benefit for suppliers is that CloudStore will be viewable online by anyone. I would not be surprised if people in other sectors and other countries looked at the UK Government’s cloud store to get an idea of what suppliers have been accredited by the UK Government, what services they offer and what prices they offer. There is also the capability for public sector bodies to write Amazon style reviews of the service they have received.

One of the speakers mentioned how pleased she had been with the response from suppliers. Hundreds of applications were received when the CloudStore OJEU issued late last year, and companies that did not apply first time around will be given further opportunities in future to apply to get onto the store.

G-cloud pilot – a County Council puts its e-mail into the cloud

We heard from a county council who were putting their e-mail into the cloud, as a pilot G-Cloud project. They had received six bids – three from vendors offering public cloud services, three from private clouds. They narrowed it down to three bids – Microsoft’s Office 365, Google Apps, and IBM (who offered Lotus notes from a private cloud). Each bid provided the functionality they wanted so they went on price alone (which tells you that e-mail, calendaring and basic collaboration is now a commodity). Google Apps won.

The Council picked an initial group of around 150 volunteers to trial Google Apps. In order to avoid a self selecting sample of technology enthusiasts they asked volunteers to give a reason why they wanted to join trial, and picked people with a range of different motivations. The volunteers were not given face-to-face training, but were each set up on Yammer so that they could act as a support community for each other. They have only received four calls to the service desk since it started.

One of the first things they found was how quick it was to bring people onto the service. They bought some servers to use to migrate users from their existing system (Lotus notes e-mail hosted in-house) to Google Apps in the cloud. The servers will not be needed as soon as the migrations have all taken place. They had 15 users up and running on the service within a week of signing the deal.

They have resisted the temptation to bring the whole organisations over to Google Apps in one big bang. Running two systems alongside each other brings with it inconveniences around calendaring – some staff are using Lotus Notes calendars and some using Google Apps so it is difficult for them to share appointments etc. Their initial volunteer group of 150 people had to be expanded to 250 simply because some of the volunteers had colleagues that they needed to be on the same calendaring system with.

The Council are going to look in the spring at integrating Google apps with their EDRMS so that it becomes easier for colleagues to save e-mails needed as records. They may also start working with Google Sites at some point (which would bring the implementation into the filesharing /collaboration space).

G-cloud and security

The Council said one of the benefits of G-cloud for them was that they did not have to think through on their own and from scratch the questions of security in the cloud and personal data in the cloud. A lot of the thinking had been done centrally, on a public sector wide basis (with the caveat that individual public sector bodies still have to assess the risks arising from their own information systems and make decisions appropriate to that level of risk).

CESG (the Government’s National Technical Authority for Information Assurance) is carrying out information assurance checks on every service that applies to join the CloudStore framework, as part of the accreditation process.

CESG has come up with a classification of business impact levels (here is the pdf)to enable public sector bodies to assess the impact of any particular type of information being compromised. Business impact level 2 corresponds broadly to the government security classification of ‘protect’. This is information that the government does not want to see in the public domain, but if it got in the wrong hands the damage would be more inconvenient than disastrous. Business impact level 3 corresponds broadly to the government security classification ‘Restricted’ – this is information where there could be serious consequences (to individuals, organisations, commercial interests or the nation as a whole) if the information got into the wrong hands.

For example personal data whose compromise is unlikely to put an individual in danger is likely to be regarded as impact level 2, whereas personal data whose compromise could put an individual in danger is likely to be marked as Impact level 3 or above. Impact level 2 covers vast swathes of government work.

Both Google Apps and Microsoft’s Office 365 have been accredited up to Impact level 2. We were told that some of the vendors had started to show an interest in being able to offer a service accredited for impact level 3 information, but for at least the short term the CloudStore would not be catering for impact level 3 information.

One IT manager told us that the point of the cloud services is that it caters for the majority of government’s needs, not for all their needs. She said it may be that public bodies simply made separate provision for restricted documentation and e-mail – even if it meant having separate booths dotted around the office with computers staff could use for ‘restricted’ communications.

G-cloud, data protection, and the issue of storing data outside of the EU

One of the big concerns with cloud adoption has been the 8th data protection principle (present in the data protection legislation of every EU member state) which states that personal data should not be transferred outside the European Economic Area unless that country or territory ‘ensures an adequate level of protection for the rights and freedoms of data subjects in relation to the processing of personal data’.

There is also a wider concern that where information is stored out of the country, and particularly when it is stored outside the EU, then it comes under a legal framework that the UK cannot control (for example many countries have legislation giving their governments powers of inspection, on security grounds, of information held in their territory).

The speakers at the meeting referred to Cabinet Office Guidance on Government ICT offshoring. The guidance states that no information with a national security implication should be stored outside the country (whatever the impact level).

Personal data is a slightly different matter- the Cabinet Office guidance does not forbid personal data being stored outside the EU, provided measures are in place to ensure that the contractor treats the data in an ‘adequate’ manner (‘adequate’ meaning compliant with EU data protection principles and practice), and provided the security in the system is appropriate to the impact level of the information. The guidelines give three ways of ensuring that a contractor operating from overseas has an ‘adequate’ data protection regime – safe harbor, model clauses and binding corporate rules.

The safe harbor scheme was set up jointly by the EU and the US. Individual US companies that sign up for the safe harbour scheme are considered ‘adequate’ by the EU and therefore the UK public sector is not contravening this principle by storing such data with these companies. The safe harbor arrangement has been criticised by some commentators. Chris Connolly said ‘The Safe Harbor is best described as an uneasy compromise between the comprehensive legislative approach adopted by European nations and the self–regulatory approach preferred by the US’. However this article from The Register last month predicts that the safe harbour arrangement will survive the proposed forthcoming overhaul of EU data protection legislation.

The second of the measures is model contract clauses with companies to ensure that the company operates ‘adequate’ protections in relation to the data it stores under the contract. The European Commission has drawn up some such clauses and the so has the UK Government.

Binding corporate rules are where the Government accepts that the internal policies of a company operating both within and outside the EU are strong enough to ensure that an ‘adequate’ data protection regime is operated across the whole company (and not just inside the EU). The guidance states that such corporate rules are an alternative to model contract clauses and must be approved by a relevant data privacy supervisory authority ( the Information Commissioner in the UK, or an equivalent in another member state).

Posted on February 18, 2012February 18, 2012 by James Lappin 0

Why is content migration so difficult?

Migrating content from one application to another is a problem that even now, two decades into the digital age, we have no solution for. Migrating content is often so labour intensive and complex as to be non cost effective. Any content migration involves compromises and ommissions that result in a significant loss of quality of the metadata that is held about the content being migrated.
Solving the content migration problem is about to become more urgent with the growing popularity of the software-as-a-service (SaaS) variety of cloud computing. In this model the provider not only provides the software application, they also host your content. Imagine what would happens if your organisation decided it wanted to change from one SaaS provider to another. It wants to change from Salesforce to a different SaaS CRM. Or it wants to go from SharePoint online to Box or Huddle or another collaboration/filesharing offering (or vica-versa). What do you do? How do you migrate content from SharePoint Online to Box? They have little in common in terms of how they are architected and what entities they consist of. What is the Box equivalent of a SharePoint content type?
The vendor lock-in problem is very real. If you can’t migrate the content you are left paying two sets of SaaS subscriptions and managing two SaaS contracts. If you were leaving because of a breakdown of trust with your original SaaS providor then how happy would you be leaving your content locked up with them on their servers?

Content migration is a problem that affects all organisations, and which affects archivists as much as records managers

The difficulty organisations experience in migrating content from one application to another matters in many situations. It matters when an organisation wants to replace an application with a better application from a different supplier. It matters in a merger/acquisition scenario, when an organisation wants to move the acquired company onto the same applications that the rest of the group are using.
It matters to archivists, because any transfer of electronic records from an organisation to an archive is, to all intents and purposes, a content migration. I heard the digital preservation consultant Philip Lord say at a conference that the big difference for archives of the electronic world over the paper world is that:

in the paper world it was possible for an archive to set up a routine process for transferring hard-copy records that it would expect all bodies contributing records to the archive to adhere to
in the electronic world everytime an archive wishes to accept a transfer of records from a new information system it needs to work out a bespoke process for importing that content and its metadata from that particular system into their electronic archive.

Different applications keep their metadata in profoundly different ways

Migration from one application to another is extremely time consuming because you are:

mapping from one set of entity types to another. Entities are the types of objects the application can hold (users/groups/documents/records/files/libraries/sites/retention rules etc)
mapping from one set of descriptive metadata fields to another
mapping from one set of functions to another. Functions are the actions that users can be permitted to perform on entities in the system (for example: create an entity/amend it/rename it/move it/copy it/delete it/attach a retention rule to it/grant or deny access permissions on it)
mapping from one set of roles to another. Roles are simply collections of functions, grouped together to make it easier to administrate them. For example in SharePoint the role of ‘member’ of a site collects together the functions a user needs to be able to access a site and view and download content, and to contribute new content to the site, but denies them the functions they would need to administer or change the site itself.

Let us imagine we want to migrate content from application A to application B. Application A has an export function and can export all of its content and metadata into an xml schema. That is good. We go to import the content and metadata into application B. This is where we hit problems.

Application B looks at the audit logs of application A. They contain a listing of events (actions performed by users on entities within the system, at a particular point in time). Each event listed gives you the identity of the user that performed the function, the name of the function they performed, the identity of the entity they performed it on and the date or time at which the event occurred. Application B won’t understand these event listings. It is unlikely to understand the identifiers application A uses to refer to the entities and users. It is unlikely to understand the functions performed because application A has a different set of functions to application B.

Application B looks at the access control lists of application A. Each entity in application A has an access control lists that tells you which users or groups can perform what role in relation to that entity. Application B does not understand those roles, nor does it understand the functions that the roles are made up of. Therefore system B cannot understand the access control lists.

The end result is that application B cannot understand the history of the entities it is importing from application A, and it cannot understand who should be able to access/contribute to/change them. It is also going to find it difficult to import things like retention rules, descriptive metadata fields, controlled vocabularies.

Migration reduces the quality of metadata

The process of migration is ‘lossy’. In the world of recorded music it is said that when you move music from one format to another (LP to tape, tape to mp3 etc.) you cannot gain quality, you can only lose it. When you migrate content from one system to another you cannot gain information about that that content, you can only lose it. There will be whole swathes of metadata in system A that it will not be cost effective for you to map to conterpart metadata in system B. You end up migrating content without that metadata, and your knowledge about the content that you hold is poorer as a result.

The fact that content migration is so labour intensive and lossy means that many organsiations opt to leave content in the original application and start from scratch in the new application. This is a nice easy option, but there are downsides. It means that the organisation has to maintain the original application for as long as it needs to keep the content that is locked within it. This means paying the resultant cost of licence fees and support arrangements. It also means a break in the memory of the organisation. Users of the new system wishing to look back over previous years will have to go to the old system to view the content. That is OK for a short period, during which time most colleagues will remember the old system and how to use it. But as time goes by a larger and larger percentage of colleagues will have no knowledge or memory of the older system and how to use it.

The organisation may mitigate the impact of that by connecting the search capability of system B to the repository of system A. The results of this are hit and miss. The search functionality of system B will have been calibrated to the architecture of system B, it will not be calibrated to the architecture of system A. Yes, it will return results but it will not be able to rank them very well (and you are still having to maintain system A in order that system B can run the search on it).

What can electronic records management specifications do to improve this situation?

The problem of content migration is not specific to records systems, it is a universal problem that affects any organisation wishing to move content from any kind of application to another application.

But it is a problem that is central to the concerns of records managers and archivists, because as a profession(s) we are concerned with the ability to manage records over time, and difficulties in migrating content hamper our ability to manage content over time. We know that applications have a shelf life – after a period of years a new application comes along that can do the same job better and/or cheaper, and therefore we want to move to the new tool. The problem is that retention periods for business records are usually longer than the shelf life of applications. Therefore it is probably from the records management or archives world that a solution will come to this problem, if it comes at all.

The first generation of electronic records management system specifications (everything from the US DoD 5015.2 that first came out in 1998 to MoReq2 which came out in 2008), did not attempt to tackle the problem. They told vendors what types of metadata to put into their products – but they did not tell vendors how to implement that metadata. For example these specifications would specify that records had to have a unique system identifier, but it was up to the vendor what format that identifier took. They had to have a permissions model but what functions and roles they set up was up to the vendor, and so on.

This lack of prescription had the benefit of sparing vendors of existing products the necessity of re-architecting the way they assign identifiers/implement a permissions model/ keep event histories etc. Had existing vendors been forced to re-architect in such a way it would have proved a major disincentive for them to produce products that complied with the specification. But the disadvantage was that the electronic document and records management systems (EDRMS) that these specifications gave rise to each had their own permissions models and metadata structures. When an organisation wanted to change from one specification compliant EDRMS to another, they had the same content migration problems as you would when migrating content between instances of any other type of information system. An archive (for instance a national archive) wishing to accept records from different EDRM systems would need to come up with a bespoke migration procedure for each product.

MoReq2010’s attempt to facilitate content migration between MoReq2010 compliant systems

MoReq2010 marks something of a break with past electronic records management specifications. One of its stated aims is to ensure that any compliant system can export its content together with their event history, their access control list and their contextual metadata, in a way that any system that has the capability of importing MoReq2010 content can understand and use.

In order to this it has had to be far more prescriptive than previous electronic records management specifications in terms of how products keep metadata.

For example

It tells any compliant system to give each implementation of that system a unique identifier. This means that any entity created within that implementation will be able to carry with it to subsequent systems information about the system it originated in
It tells every implementation of every compliant system to give each entity it creates the MoReq2010 identifier for that entity type, so that any subsequent MoReq 2010 compliant system that the entity is migrated to understands what type of thing that entity is (is it a record? or an aggregation of records? or a classification class or a retention schedule? or a user? or a group? or a role?)
It tells every implementation of every compliant system to give every entity created within it a globally unique identity an identifier in a MoReq2010 specified format. Each entity can carry this identifier with it to any subsequent MoReq 2010 compliant system, no matter how many times it is migrated
It tells every implementation of every compliant system to give each entity an event history that not only records the functions performed on that entity whilst it is in the system, but which also could be carried on and added by each subsequent system.
It tells each compliant system to create an access control list for each entity in the system, that governs who can do what in relation to that entity whilst it is in the system, and which can be understood, used, and added to by any subsequent compliant system that the entity is migrated to.

To achieve the last two of these ambitions MoReq2010 had to get into the nitty gritty of how a system implements its permissions model.

MoReq2010 and permissions models

I recorded two podcasts with Jon Garde about the permissions model in MoReq2010:

episode 7 of Musing Over MoReq2010 is about how the ‘user and group service’ section of the MoReq2010 specification
episode 8 (shortly to be published here )is about the ‘model role service’ section – the part of the MoReq2010 specification that deals with functions (the actions users can perform within the system) and roles (collections of functions).

In the latter podcast Jon said that the model role service was the part of MoReq2010 that caused him the most sleepless nights when he wrote it. The problem was that every product on the market already has a permissions model, with its own way of describing the functions that it allows its users to perform on entities within the system.

The dilemma for Jon writing Moreq2010 was as follows:

If the specification prescribed a way for each system to implement its permissions model then existing systems would have to be rewritten and this would act as a major disincentive for vendors to revise their products to comply with MoReq2010
If the specification did not prescribe a way for each system to describe the functions that users could perform within it then subsequent systems would not be able to understand the event histories of exported entities (because it would not understand which actions had been performed on the entity concerned) or their access control lists (because it would not understand what particular users/groups of users were entitled to do to that entity)

The solution that Jon adopted was half way between these two options. In the model role service MoReq2010 outlines its own permissions model, with definitions of a complete set of functions that a record system can allow users to perform on entities.

MoReq2010 does not insist that to be compliant a system must implement every one (or even any one) of the functions that are outlined within the model role service. It allows products to carry on using their own permissions model. However MoReq2010 does insist that a system must be able to export their content and metadata with the functions and roles expressed as the functions and roles outlined in the MoReq2010 specification. In other words a product would need to map its existing permissions model (functions and roles) to MoReq2010 functions and roles. This would mean that two MoReq compliant systems with entirely different permissions models could both export their content with all of the functions in the access control lists and the event histories expressed as MoReq2010 functions.

Mapping the functions and roles in their product’s permission model to MoReq2010’s permission model is a significant body of work for vendors of existing systems, and they will obviously make a commercial judgement as to whether the benefit to them of achieving MoReq2010 compliance outweighs the cost of the investment they will need to make those mappings and to implement the other changes, such as the identifier formats, that MoReq2010 demands.

Because MoReq2010 is so prescriptive as to how systems keep metadata it could well be that it is easier for new entrants to the market to write new products from scratch to comply with the specification than it is for existing vendors to re-architect their products to comply. If I was a vendor writing a new document or records management system from scratch I would certainly think about simply implementing the MoReq2010 permissions model outlined in the model role service.

Why is import more complex than export?

The core modules of MoReq2010 include an export module. Every compliant system must be able to export entities and their event histories, access control lists and contextual metadata in a MoReq2010 compliant way. There is no import module in the core modules of MoReq2010. Vendors can win MoReq2010 compliance for their products without their products being able to import content and its metadata from other MoReq2010 compliant systems.

The import module of MoReq2010 is being written as I write, and is scheduled for release sometime in 2012. It will not be compulsory. The reason why the import module is not a compulsory module of the specification is that not all records systems will need to import from other MoReq2010 compliant records systems. For example by definition the first generation of compliant systems will not have to import from other compliant systems (because they have no predecessor compliant systems to import from!).

It will be more complex for a system to comply with the import requirements of MoReq2010 (when the module is published) than it is with the export requirements.

For example:

an existing product that seeks compliance with the core modules of MoReq2010 (but not the additional and optional import module) will have to map its functions (actions/permissions) and roles to the functions and roles outlined in MoReq2010. It does not have to worry about all the functions listed in MoReq2010 – only the functions that it needs to map its own functions to
a product that seeks additionally to comply with the import module of MoReq2010 compliant system will need to be able to implement all of the functions listed in MoReq2010 – because it needs to be able to import content from any MoReq2010 compliant system and a MoReq2010 compliant system may chose to use any of the functions listed in MoReq2010.

I put it to Jon in our podcast on the model role service that we would know that MoReq2010 had ‘arrived’ if and when someone brings to market a product that complies with the import module and is capable of importing content from MoReq2010 compliant systems. Once you have products capable of importing from MoReq2010 compliant systems there is all of a sudden a purpose to implementing MoReq2010 compliant systems – the theoretical possibility of being able to pass content onto another system that understands the content as well or nearly as well as the originating system is turned into a practical reality. Once you have a product that is capable of importing from MoReq2010 compliant systems it is in the interests of anyone implementing that product to influence whoever runs the applications that they wish to import from to make those applications MoReq2010 compliant. Imagine a national archives running an electronic archive with a MoReq2010 import capability. It would be in the interests of that national archives to pursuade the various parts of government who contribute records to them to implement MoReq2010 compliant systems.

Jon’s response on the podcast was to lay down a challenge to the archives world to develop a MoReq2010 compliant electronic archive system, with a MoReq2010 compliant import capability.

What are the chances of MoReq2010 catching on?

MoReq2010 is doubly ambitious. In this post I have looked at its ambition to ensure that content can take its identifiers, event history, access control list and contextual metadata with it through its life as it migrates from one system to another. Its other great ambition is to reach a situation where any application in use in a business is routinely expected to have record keeping functionality. The two ambitions are related to each other.

MoReq2010 makes it feasible for the vendor of a line of business system to add records management functionality to their product and get it certified as being a compliant records system. The specification has done this by eliminating from the core modules any requirements that are would not be necessary for every system to perform however small and however specialised. A compliant system does not have to be able to do all the things an organisation-wide electronic records management system would have to do. It only needs to be able to manage and export its own records. Note that MoReq2010 makes it possible for vendors of line of business systems to seek compliance, but the specification alone cannot incentivise them to do this – incentivisation would have to come from the market or from organisations that could influence the market
Because MoReq2010 allows the possibility for records to be kept in multiple line of business and other systems within an organisation then the issue of migration becomes very important. When a line of business applicatin is replaced the organisation will need to migrate content either to the application’s replacement or to an organisational records repository or or to a third party archive. Hence the ambition that any compliant records system can export content and metadata in a way that another compliant system can understand.

Being ambitious carries with it a risk. MoReq2010 does call for existing vendors to re-architect its systems, and vendors do not like re-architecting their systems. If too few vendors produce products that comply with the specification then MoReq2010 will go the way of its predecessor, MoReq2, which died because only one vendor felt it was commercially worthwhile to produce a product that complied with it.

In the situation that electronic records management finds itself in, being ambitious is less risky than trying to incrementally tweak previous specifications. MoReq2, failed because by the time it was published in 2008 the bottom had fallen out of the market for the EDRM systems that it and previous electronic records management system specifications underpinned. SharePoint had come along and pushed it over like a house of cards.

EDRM fell without so much as a whimper because no-one was prepared to defend it. Archivists were not prepared to defend it because they had not benefited from it – it was as hard for them to accept electronic transfers from EDRM systems as from any other type of application. Practioners were not prepared to defend it because it had proved difficult and expensive to implement monolithic EDRM systems across whole enterprises. The ECM vendors who had acquired EDRM products were not prepared to defend it because EDRM represented only a relatively small portion of their portfolio, and they had no stomach for a fight with Microsoft.

MoReq2010 has a chance of success. It is not guaranteed to succeed, but it has a chance. The reason why it has a chance is because it is addressing the right two questions – how do we get records management functionality adopted by all business applications? and how do we ensure that content can be migrated easily and without significant loss of metadata from one application to another?

These questions will have to be nailed. If MoReq2010 succeeds in nailing them so much the better. If it doesn’t, if the market isn’t ready for it, then whatever specifications come after it will have to nail them. There is no going back to the EDRM ‘one records system-per-organisation’ model.

Posted on February 13, 2012February 14, 2012 by James Lappin 0

Thinking Records

James Lappin's records management blog