This page contains a description of the use of document filters for indexing, as well as descriptions of all types of filters with examples of their creation.
Setting a filter
To indicate which documents from adding folders should be indexed and which not, the setDocumentFilter method of the IndexSettings class can be used to specify a filter based on various properties of these documents. If a document adding separately or located in an adding folder does not match the filter, then it will not be added and indexed. The default value is null, which means that all added files will be indexed if their format is supported. The following example demonstrates how to set a document filter for indexing.
StringindexFolder="c:\\MyIndex\\";StringdocumentsFolder="c:\\MyDocuments\\";// Creating a filter that skips documents with extensions '.doc', '.docx', '.rtf'
IndexSettingssettings=newIndexSettings();DocumentFilterfileExtensionFilter=DocumentFilter.createFileExtension(".doc",".docx",".rtf");// Creating file extension filter that allows only specified extensions
DocumentFilterinvertedFilter=DocumentFilter.createNot(fileExtensionFilter);// Inverting file extension filter to allow all extensions except specified ones
settings.setDocumentFilter(invertedFilter);// Creating an index in the specified folder
Indexindex=newIndex(indexFolder,settings);// Indexing documents
index.add(documentsFolder);
Creation time filters
The first type of filters is based on the time the document file was created. Such filters can skip files created earlier than a certain date, later than a certain date, or outside a certain range of dates. Examples of these filters are given below.
// The first filter skips files created earlier than January 1, 2017, 00:00:00 a.m.
DocumentFilterfilter1=DocumentFilter.createCreationTimeLowerBound(newDate(2017-1900,1-1,1));// The second filter skips files created later than June 15, 2018, 00:00:00 a.m.
DocumentFilterfilter2=DocumentFilter.createCreationTimeUpperBound(newDate(2018-1900,6-1,15));// The third filter skips files created outside the range from January 1, 2017, 00:00:00 a.m. to June 15, 2018, 00:00:00 a.m.
DocumentFilterfilter3=DocumentFilter.createCreationTimeRange(newDate(2017-1900,1-1,1),newDate(2018-1900,6-1,15));
Modification time filters
The next type of filters works similarly, but based on the document file modification date. Examples are presented below.
// The first filter skips files modified earlier than January 1, 2017, 00:00:00 a.m.
DocumentFilterfilter1=DocumentFilter.createModificationTimeLowerBound(newDate(2017-1900,1-1,1));// The second filter skips files modified later than June 15, 2018, 00:00:00 a.m.
DocumentFilterfilter2=DocumentFilter.createModificationTimeUpperBound(newDate(2018-1900,6-1,15));// The third filter skips files modified outside the range from January 1, 2017, 00:00:00 a.m. to June 15, 2018, 00:00:00 a.m.
DocumentFilterfilter3=DocumentFilter.createModificationTimeRange(newDate(2017-1900,1-1,1),newDate(2018-1900,6-1,15));
File path filters
The next type of filters allows you to set a regular expression for skipping those documents whose full paths do not match the specified pattern. This type of filters uses the java.util.regex.Pattern class to compare with a pattern.
// The filter skips files that do not contain the word 'Einstein' in their paths
DocumentFilterfilter=DocumentFilter.createFilePathRegularExpression("Einstein",Pattern.CASE_INSENSITIVE);
File length filters
The following group of filters uses the file length in bytes for filtering. It is possible to specify a lower bound, an upper bound or the range of acceptable file length. Examples are below.
// The first filter skips documents less than 50 KB in length
DocumentFilterfilter1=DocumentFilter.createFileLengthLowerBound(50*1024);// The second filter skips documents more than 10 MB in length
DocumentFilterfilter2=DocumentFilter.createFileLengthUpperBound(10*1024*1024);// The third filter skips documents less than 100 KB and more than 5 MB in length
DocumentFilterfilter3=DocumentFilter.createFileLengthRange(100*1024,5*1024*1024);
File extension filter
The following type of filters allows you to specify a list of valid file extensions for indexing.
// This filter allows indexing only FB2, EPUB, and TXT files
DocumentFilterfilter=DocumentFilter.createFileExtension(".fb2",".epub",".txt");
Logical NOT filter
The next type of filter allows you to invert the logic of an internal filter.
IndexSettingssettings=newIndexSettings();DocumentFilterfilter=DocumentFilter.createFileExtension(".htm",".html");DocumentFilterinvertedFilter=DocumentFilter.createNot(filter);// Inverting file extension filter to allow all extensions except of HTM and HTML
settings.setDocumentFilter(invertedFilter);
Logical AND filter
The filter of the following type allows you to compose a complex filter from several other filters, using the logic “AND”. This composite filter requires the simultaneous fulfillment of the conditions of all internal filters for each file to be added. The example below shows how to make a filter that allows indexing only documents created in the date range from 01/01/2015 to 01/01/2016, with the extension ‘.txt’ and a size of no more than 8 MB.
The filter of the following type allows you to compose a complex filter from several other filters, using the logic “OR”. This composite filter requires fulfillment of the condition of at least one internal filter for each added file. The example below shows how to create a filter that limits the size of text files to 5 mb and other files to 10 mb.