Information Types

Posted by:

Data Leakage Prevention concept relies on detecting information on a data transfer or residing data. Maybe, the most important think in a DLP product is being able to define this “information” with easy-to-use instruments. In MyDLP, we call this content definition instrument as Information Type.

DLP inspections on channels (such as Web, Mail, Removable Storage, Printer, Discovery … ) is done according to associated information types.

MyDLP has been shipped with a lot of predefined information types. Picture below show some of these information types:

Predefined Information Types

Creating Custom Information Types

Ok, there is a lot of useful predefined information types for both compliance necessities and simple requirements, but what should you do if you want to create a DLP policy with your own information types? Which defined for your unique requirement? This tutorial basically explains information types and its sweet instruments; Information Feature and Matcher.

A Custom Information Type can be easily created by clicking add_button button ( which will appear after clicking User Defined ) at the right side of the User Defined category item of inventory tree (just like any custom inventory item in MyDLP). As shown here:


After clicking this button, a dialog box will appear at the middle of the screen and you should select it to create a new custom Information Type. You will see a dialog box similar this:


Basically, we use two instruments to define an information type. These are Data Formats and Information Features.

Data Formats

Data Formats are used for determining the data formats (a data format is combination of several mime types ) that will be considered as a candidate for this Information Type. For example, if you select All Formats, all kind of files (or data) will be a candidate and DLP inspection (which will be defined in Information Features section) will be done for every single file. As similar, if you select PDF, PS, only files (or data) in Portable Document Format and PostScript formats will be considered as a candidate and DLP inspection will be done on only this kind of files.

Information Features

In information features section, we will be able to define Information Features (not surprisingly). Most important part of an Information Feature is the Matcher. All other properties of the Information Feature will be asked after selecting the Matcher, because every single Matcher has different functionalities and these different functionalities require different configuration options. The Matcher simply declares what you are looking for it in a file ( or flowing data through a channel ). Picture below is an example of Birth Date Matcher. Birth Date matcher matches birth dates and requires a property named Threshold value. This Threshold value specifies the number of occurrences of positive matching (in this case birth dates) in file (or flowing data chunk). For example, with this Information Feature (below), you are looking for (at least) two valid and separate birth date occurrences:



Also, there is a property named Distance in Feature Configuration section. Distance is not applicable for all kind of Information Features. But, almost all of them support Distance functionality.
When you are using Distance property, DLP analysis will return positive only if all defined Information Features have been found in specified distance. This features lets you make DLP analysis in a context and drastically decrease false positives in big files.

For example Picture below describes Distance usage briefly. In this example, there are two Information Features:

  1. Birth Date with threshold value 2
  2. Keyword “MyDLP” with threshold value 3

Distance is applicable for these Information Types and it has been set to value 250. It means that you are looking for two birth dates and three separate “MyDLP” keywords (keyword matcher directly matches exact string [case insensitive] ) in a 250 characters length sequence.


There are two important things here. First is the character sequence we are talking about, has been extracted from file (or data flow) and normalized ). And the other is that There is an AND relation between information features. Only finding birth date twice or “MyDLP” three times is not enough, both of them should be satisfied to return a positive for this Information Type.

Using Information Types in Rules

Now we have an Information Type. But, how are we going to use it? Final section of this document is about combining different information types in a rule.

In the picture below, there is a Web rule (which inspects web channel) with two information types. This means that MyDLP will look for these information types in data and if any of them is found, specified action will be applied. In another words, assume that “Information Type Sample” simply matches single birth date occurrences and “Another information type” matches valid Credit Card Numbers. When DLP analysis has been started for data flowing through web channel (maybe an uploaded file or Twitter message), MyDLP will look for finding either a birth date or a Credit Card Number. If one of them is found, data will be quarantined (for this example, because its action is Quarantine).


Important thing here is to remember that there is a OR relation between information types in a rule.

As always, for any questions or comments, please directly comment to this post.

Have a good day!


About the Author:

Add a Comment