MyDLP has convention of Document Database which is logical group of different documents. We prefer the term document, because what we are focusing is information. Representation may change, document could exists as a file, a database table field or just a simple text message.
Aim of MyDLP Document Databases is to simplify document grouping, whether source of document is text file, office file, compressed archive, RDBMS database etc… does not matter.
You can use MyDLP Document Databases with two different matchers; Hash (Exact Document Matching), PDM (Partial Document Matching).
Hash (Exact Document Matching)
This is either an exceptional matcher. Maybe the only matcher that assumes data being transferred is a file. This matcher simply controls whether MD5 Hash of sent file is included in select Document Database. This matcher only match when the exact copy of one of the files which have been previously added to selected Document Database. It can be used to catch images, videos or other non-text content. Or you can use this matcher with PASS action to create behavior like whitelisting (or file exceptions).
PDM (Partial Document Matching)
This may be the most useful matcher in MyDLP, also called Unstructured Data Matching, Intelligent Content Matching or Statistical Document Matching.
By using PDM, you can catch your trained documents whether those are injected with non-related sentences. For example, even if your user tries to copy a confidential data and paste into a innocent file, MyDLP will be able to detect its presence.
Let me explain, how this works. While you were adding a document to a document database, MyDLP generates fingerprints for added document. Size of the fingerprint is 4-bytes and approximately for every 250 character a fingerprint has been generated (these could drastically change according to entropy in text). Afterward when MyDLP analyzing a document using PDM, it will cross-compare this fingerprints and calculates a score according to matching density. If this score is higher than the threshold that you’ve specified, analyzed document will be considers as a match.
As always, for any questions and comments you can directly comment to this post.
Have a good day!