Skip to end of metadata
Go to start of metadata
Contents Summary
 

The code in below examples uses some methods defined in Common Utilities.

Extract Text from Word Documents

For extracting a text from text documents WordsTextExtractor and WordsFormattedTextExtractor classes are used. WordsTextExtractor class supports the following interfaces:

  • IHighlightExtractor
  • ISearchable
  • IRegexSearchable
  • IStructuredExtractor

WordsFormattedTextExtractor adds the support for extracting a formatted text and it supports the following interfaces:

  • IHighlightExtractor
  • IPageTextExtractor
  • ITextExtractorWithFormatter

Learn more about the supported formats here.

The Recipe

Using GroupDocs.Parser for .NET, the user can extract text from MS Word documents. Given steps are needed to be followed:

  • Get file path
  • Initialize the extractor object with file
  • Using ExtractAll() method extract text from the page index specified

The Code

You can also extract text line by line as shown in the code sample below:

Extract Formatted Text

The Recipe

  • Initialize the extractor object with the file
  • Set table frame
  • Set extractor.DocumentFormatter to mark down document format
  • Extract all formatted text

The Code

Extract Table with Format

The Recipe

  • Initialize the extractor object with the file
  • Set table frame
  • Set frame to extractor.DocumentFormatter
  • Extract all formatted text

The Code

HTML Text Formatting

The Recipe

  • Initialize the extractor object with file
  • Set table frame
  • Set extractor.DocumentFormatter to HTML document format
  • Extract all formatted text

The Code

Labels
  • No labels