Automatic DSAR Data Detection and Deduplication
Posted on 30/08/22
Automatic DSAR Data Detection
To understand how to discover data more easily, it’s important to first understand where data typically resides.
- 10% of all data in the world resides in structured databases, such as HR systems and Customer Relationship Management (CRM) systems. Extracting and reviewing data within these systems is straightforward and, in most cases, doesn’t present DPOs with too many problems.
- 90% of all data, however, is unstructured, and resides in email systems and Enterprise Social Management systems, such as Slack and Teams. Extracting and reviewing data from these systems is a different ball game so requires a different approach. This is where Smartbox.ai can help.
Search technologies that enable you to find references to a known subject within unstructured datasets have been around for many years. However, when you are conducting DSARs it’s not finding the references to the known subjects that causes the problem, but finding the UNKNOWNS, after all, you don’t know what you don’t know.
Using rudimentary search tools from the likes of Adobe and Microsoft only get you so far. They are good at surfacing the names and personal information of people you are looking for but don’t automatically surface the names and personal information of anyone else mentioned in the review-dataset – and it’s the latter than can trip organisations up as third-party information can inadvertently be disclosed when it should not be.
This is where Smartbox.ai comes into its own, as it automatically looks for, and highlights, every instance of any potentially sensitive information, empowering you to decide if you want to then redact the information or not. Without the need for any configuration, the system can review millions of documents at speed and highlight every name, postcode, credit card number, date of birth, National Insurance number and more, automatically. What’s more, if your organisation uses bespoke labelling formats, such as Employee Numbers, you can configure the system to automatically identify them too.
Smartbox.ai is specifically designed to reduce the effort needed by DPOs while minimising the risk of an accidental data breach.
Automatic DSAR Deduplication
Within unstructured datasets it is normal to find information duplications. Just think of your own emails. You may send an email to someone, who sends it to someone else etc. In this example your initial email is duplicated.
In most cases information-duplication isn’t a problem, however, if you are tasked with reviewing a large dataset, duplication can materially increase the time needed to review it. This is where Smartbox.ai can help.
Without requiring any configuration or effort, Smartbox.ai automatically removes all duplicated information contained in a dataset, reducing the dataset by around 63%. This significantly smaller dataset can then be more easily reviewed and redacted as required.
To learn more about Smartbox.ai and how it can materially reduce the effort needed to process DSARs, book a demo today.