Records management is a corporate function that takes place within the context of a wider digital economy, and is influenced by developments in that wider digital economy. The digital economy is three decades old but it is still moving, and still growing.
One of the key trends in the digital economy has been the growth of what we might call ‘hyperscale’ services. These are services that can be scaled up to the level of the entire digital economy, for use by all, most or many actors across that economy.
We have seen the emergence of successive waves of hyperscale services:
- in the 1990s – hyperscale search engines, consumer email services, and retail services
- in the 2000s – hyperscale social media platforms
- in the 2010s – hyperscale cloud storage and compute services (like Amazon Web Services and Microsoft Azure) and hyperscale cloud productivity suites such as Office 365 (now Microsoft 365) and Google Apps (now Google Workspace)
- in the 2020s – hyperscale AI services such as ChatGPT
The impact of hyperscale services on records management
Most organisations in our digital economy are now deploying a cloud productivity suite (Microsoft 365 or Google Workspace). These suites provide general purpose applications for messaging, collaboration and document management. They can be used by any and everybody in the organisation whilst working on any business activity. In the on-premise era these capabilities would have been provided by stand-alone applications (an email system, a document management system, and perhaps a collaboration system) that would have been separately deployed on a corporate on-premise network.
The shift from on-premise systems to hyperscale cloud productivity suites is a change in the context in which records management takes place. It has:
- changed the architecture of the systems within which individuals communicate, collaborate and store documents– with a move to relatively simple architectures based on team or individual sites/accounts/drives
- changed the nature of the relationship between document management systems and communication tools – search and AI tools within a cloud suite are able to use information in emails and chat messages as metadata about the documents held in the suite
- speeded up the point at which organisational content is brought into contact with AI – Google and Microsoft have been big players in the emergence of hyperscale AI services – Google researchers devised the transformer architecture on which all large language models (LLMs) are based, and have since developed the Gemini series of LLMs. Microsoft was the largest investor in OpenAI before and after the launch of its Chat GPT service, and they host OpenAI’s GPT models on their Azure cloud. Both Microsoft (using OpenAI’s GPT models) and Google (using their Gemini models) offer generative AI services within their cloud productivity suite.
The relationship between hyperscale cloud productivity suites and corporate records management
There is a demarcation line between hyperscale cloud productivity services and the corporate functions of records management and of information governance. Microsoft and Google respect the right of organisations to set their own access permissions and retention rules on content within M365 and Google Workspace. The AI tools that both Microsoft and Google have deployed within their suites (Microsoft 365 and Gemini for Google Workspace) are deployed to support individual productivity. They do not seek to change or override the access permissions and retention rules placed on content. Indeed these permissions and rules act as important guardrails on the operation of those services.
The architecture of the applications within cloud productivity suites
Organisations typically set access permissions and retention rules as defaults on aggregations/containers (folders, sites, accounts etc.). This is usually more efficient, more scaleable, safer, and easier to monitor, than applying such rules directly to individual items (though the ability to set exceptions from defaults is also important). From a records management perspective the key elements of any system’s architecture are the aggregations to which default access permissions and retention rules can be applied.
The way that a corporate collaboration/document management application is architected has a crucial impact on:
- the speed at which the application can be deployed
- the range of organisations capable of deploying the application
- the precision by which default access and retention rules can be set on content created or received within the application
There is a trade-off here. For any corporate wide document management/collaboration system:
- an architecture that aggregates content into business activity specific aggregations (such as project, case or matter specific folders) will take a relatively long time to deploy, and a relatively high level of information management maturity to deploy, but will support a relatively high level of precision in the application of default access permissions and retention rules
- an architecture that aggregates content into team or individual specific sites/drives/ accounts will take a relatively short time to deploy, and could be deployed by organisations whatever their level of information management maturity, but will allow for less precision in the application of default access permissions and retention rules
Organisations tend to have a list of all their staff and all their organisational units. This makes rolling out a system that allocates sites, drives and accounts to individuals or organisational units a relatively straightforward process. Organisations do not tend to have a list of all the pieces of work being carried out across all areas of the business. Nor do they typically even have a list of all the types of work being carried out. This makes rolling out a corporate system based on function and activity a far from straightforward process because a central team would have to identify all these types of work (and specific instances of those types of work) as they go through the roll out.
In the late 1990s and early 2000s various public institutions, in various jurisdictions, issued specifications for electronic records management systems. These specifications were all based on the assumption that once the digital world had settled down, organisations would want to use an architecture based on function and activity.
The specifications created a market for a type of system that became known as electronic document and records management (EDRM) systems. This was a niche market because only organisations with strong records and information management skill sets would find it feasible to deploy them. This is because their architecture required a corporate business classification, and such classifications are complex to construct, test and secure organisational agreement to.
The market for today’s hyperscale cloud productivity suites is fundamentally different to the on-premise market for document management and other corporate systems of the 2000s and early 2010s. Cloud productivity suites are serving a hyperscale market which stretches to the entire digital economy, and which cannot support niche markets. These suites therefore have to be architected so that any organisation, large or small, can deploy them.
A fundamental shift in the way records are aggregated
The move to cloud productivity suites has brought a shift:
- towards team and individual based architectures – where sites are given to teams and accounts given to individuals).
- away from function and activity based architectures – where an organisation has some sort of hierarchical classification of its broad areas of work, and creates new aggregations (sites, libraries or folders) when new pieces of work start, which are assigned to the relevant place within the hierarchical structure
This shift is an inevitable consequence of the move to the hyperscale.
Compared with equivalent applications from the on-premise age, the collaboration/document management applications within Microsoft 365 and Google Workspace have a simpler architecture, are less configurable, and, as a result, are quicker to deploy. Compare the typical time taken to roll out Microsoft Teams in circa 2020 (measured in weeks) with that of an on-premise SharePoint implementation in circa 2010, or an on-premise EDRM system in circa 2007 (measured in years).
It is true that Microsoft and Google are able to offer premium service to segments of their market – for example Microsoft can offer extra information management functionality for customers willing to buy E5 licenses rather than an E3 licences. But the premium information management features exist in separate administrative services (which in the case of M365 as branded as Microsoft Purview). The applications deployed to end-users (which, in the case of Microsoft 365 include Teams, Outlook/Exchange, and OneDrive) have a core architecture that is the same for all customers, large or small, regardless of their licence.
Function and activity architectures have not disappeared. When an organisation deploys a line of business system for one area of work they invariably use an activity based architecture (think of case management systems, HR systems, student record systems etc.). However it would not be cost effective for an organisation to deploy specific systems for every separate line of business. It is only cost effective to pick out those line of business activities with the highest volume and/or value and develop specific systems for them. All other activities will tend to use all-purpose corporate systems, and that role is now generally provided by the hyperscale cloud productivity suites.
The intuition that we can apply access permissions and retention rules more precisely when we know the business activity from which content arose is still valid. This intuition can help to diagnose some of the challenges organisations will face in managing (and exploiting) content in team and individual based aggregations through time. It can also help generate options for using AI and data science techniques to improve the precision with which access permissions and retention rules are applied to content.
The fact that there has been a convergence across the digital economy on team and individual (rather than function and activity) based architectures in corporate all purpose systems is not a failure of records management or information governance. It is simply a change in the context in which they operate. The convergence has happened because corporate all purpose systems are now a hyperscale service. Function and activity architectures would have acted as a bottleneck on corporate all-purpose systems becoming a hyper-scale service.
(Views in this post are my own)