One way of thinking through the question of how AI can augment, strengthen and extend records management, is to ask:
- what are the most effective approaches that we already have without AI?
- what are the strengths of the best approaches and how far do they get us?
- what are the limitations of even the best approaches?
- what therefore is the gap that AI (and data science techniques/algorithmic approaches more generally) can most usefully address?
Existing approaches
For the past decade records management has been dominated in many countries by large cloud suites (such as Microsoft 365 and Google Workspace) that provide generic collaboration and communication applications that all or most staff can use for all or most of their activities. The most effective records management approach during this period has been ‘control at the point of provisioning’.
Control at the point of provisioning involves acting to set default access permissions and retention rules on containers (such as sites, accounts, drives etc.) at the point at which they are provided to a team or individual. Ideally it also involves capturing sufficient context about a container to enable an information manager at a later date to understand:
- who that container was provisioned to
- what role the team or individual played within the organisation

Control at the point of provisioning is an effective strategy against the build up of uncontrolled digital heaps because:
- it puts control in the hands of records and information professionals – its success is not dependent on end-user effort/buy-in. The provision of containers is a process that can be controlled by records and information management teams
- it is comprehensive – if every SharePoint site has a default retention rule then every item of content within SharePoint has a default retention rule (because every item sits within a site). If every email account has a default retention rule then every email message has a default retention rule (because every email sits within an email account)
- it is extensible – this approach can be applied in any corporate all-purpose collaboration or communication application because all such applications partition content into containers. Every one of the applications (workloads) within Microsoft 365 partitions content into containers
- it applies retention rules to meaningful groupings of content – it manages content in-place within the context of the container within which it was created, accessed and used
- it is predictable – it helps set the expectations of end-users with regard to how long the content they contribute to the container is likely to be kept
- it is compatible with any viable form of information architecture – it can be applied in systems that are hierarchical in nature (like SharePoint or EDRM systems). It can also be applied to systems that are modular in nature (like MS Teams, MS Exchange, Gmail etc.). These applications are modular in the sense that each MS Team and each email account is a stand alone object that is not nested within a classification structure of any kind
Extending a control at the point of provisioning approach
Let us imagine that an organisation has deployed Microsoft 365 and has applied control at the point of provisioning across all the applications within it. They can therefore be assured that every container (Team/SharePoint site/OneDrive/email account) has a default retention rule set on it. What further capabilities might they want from AI to improve and refine their ability to manage their content over time?
There are three limits to control at the point of provisioning:
- it doesn’t deal with legacy – by definition you can’t retrospectively impose control at the point of provisioning
- it is only as precise as the containers that are being provisioned
- it does not typically involve a structure that enables an organisation to group like containers with like and make sense of the whole
These three limitations provide a clue as to the capabilities that such an organisation might want from AI (and data science/algorithmic approaches more generally). The organisation might want:
- the capability to analyse and assess legacy containers to support the retrospective assignment of a default retention rule to each container (legacy SharePoint sites, legacy shared drives given to particular teams, legacy email accounts etc.)
- the capability to identify sub-groups within containers – for example the organisation might benefit from being able to:
- identify content within each container that does not merit the default retention rule set on it (an obvious example is unsolicited, trivial or social emails in email accounts)
- create sub-clusters of similar content within a container, which correspond to different areas of work. This may enable them to apply finer-grained retention rules or access permissions within a container
- the capability to group and classify containers -for example the organisation might benefit from being able to:
- group like containers with like (and assign a name/description to the grouping that identifies what the containers have in common)
- group like groupings of containers with like (and to iterate this process to either build a corporate records classification from the bottom-up, or to reach a point at which they can link groupings to a classification structure that they have built separately)
Relationship with other potential approaches
There exists many possible strategies for deploying AI in support of a records management programme. Control at the point of provisioning is an approach that sets default retention rules at the container level. AI offers the opportunity to try entirely new strategies – for example AI could be used to set retention rules on items without reference to the containers that they were created and used in. Or it could be used to assign items to an entirely new set of containers for the purposes of applying retention and/or access rules. This post is based on the assumption that:
- some organisations may wish to maintain a continuity of strategy as they transition to the greater use of AI to support the application of retention rules
- containers shape the way that content accumulates within corporate collaboration and communication systems. Giving containers an ongoing role in the governance of content offers benefits in terms of maintaining predictability over time and maintaining content within the context it was created
The intuition behind this post
This is the third in a series of posts that attempt to articulate intuitions about records management for data scientists. The intuition behind this post is as follows:
Control at the point of provisioning involves setting default retention rules and access permissions on containers at the point at which they are provisioned to individuals and teams. It has proved to be an effective way of preventing the build up of uncontrolled digital heaps.
The previous posts in this series are:
All views in this post are my own