Storing text of indexed documents

Text extracted from indexed documents can be stored in an index to provide the extracted text to the user faster when called the getDocumentText method, as well as to accelerate text generation with highlighting of search results.

To specify storage parameters, use the setTextStorageSettings method of the IndexSettings class. The default value is null, which means that the text of the documents is not stored in the index.

When saving text in the index, the values defined in a Compression class are used to specify the compression ratio of the saved text. Compression can be normal, high, or text can be saved without compression. The choice of compression ratio affects the final size of the index, as well as the speed of indexing. A high compression ratio reduces index size and indexing speed, and the lack of compression makes index size and indexing speed maximum. The default compression ratio is normal.

The example below demonstrates storing text in an index using the high compression ratio.

String indexFolder = "c:\\MyIndex\\";
String documentsFolder = "c:\\MyDocuments\\";
 
// Creating an index settings instance
IndexSettings settings = new IndexSettings();
settings.setTextStorageSettings(new TextStorageSettings(Compression.High)); // Setting high compression ratio for the index text storage
 
// Creating an index in the specified folder
Index index = new Index(indexFolder, settings);
 
// Indexing documents
index.add(documentsFolder);
 
// Searching
SearchResult result = index.search("Einstein");

More resources

GitHub examples

You may easily run the code from documentation articles and see the features in action in our GitHub examples:

Free online document search App

Along with full featured .NET library we provide simple, but powerful free Apps.

You are welcome to search over your PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX and more with our free online Free Online Document Search App.