Extract text in Accurate mode Leave feedback

Prerequisites

Before you begin, ensure you have:

GroupDocs.Parser for Python via .NET installed
A valid license or trial
Sample documents for testing

Extract text from document

To extract text from the entire document in Accurate mode, use the get_text() method:

Python

from groupdocs.parser import Parser

# Create an instance of Parser class
with Parser("./sample.pdf") as parser:
    # Extract text from the document
    text_reader = parser.get_text()
    
    # Check if text extraction is supported
    if text_reader is None:
        print("Text extraction isn't supported")
    else:
        # Print the extracted text
        print(text_reader)

sample.pdf

The following sample file is used in this example: sample.pdf

Expected behavior: The method returns a TextReader object containing the entire document text, or None if text extraction is not supported for the document format.

Extract text from document page

To extract text from a specific page:

class=gdoc-tabs__name>Python

from groupdocs.parser import Parser class=cl># Create an instance of Parser class class=cl>with Parser("./sample.pdf") as parser: # Check if text extraction is supported if not parser.features.text: print("Document doesn't support text extraction") return

# Get document info info = parser.get_document_info()

# Check if document has pages if info.page_count == 0: print("Document has no pages") return

# Iterate over pages for page_index in range(info.page_count): # Print page number print(f" class=cl>Page {page_index + 1}/{info.page_count}")

# Extract text from the page text_reader = parser.get_text(page_index)

# Print the page text if text_reader is not None: print(text_reader) type=radio class="gdoc-tabs__control hidden" name=tabs-example-2 id=tabs-example-2-1> class=gdoc-tabs__name>sample.pdf

The following sample file is used in this example: sample.pdf

Expected behavior: The method extracts text from each page individually, allowing you to process documents page by page.

`Extract text with error handling`

Here’s a robust example with error handling:


Pythonfrom groupdocs.parser import Parser

def extract_text_safely(file_path):
    try:
        with Parser(file_path) as parser:
            # Check feature support
            if not parser.features.text:
                print(f"Text extraction not supported for {file_path}")
                return None
            
            # Extract text
            text_reader = parser.get_text()
            if text_reader is not None:
                return text_reader
            
    except Exception as e:
        print(f"Error extracting text: {e}")
        return None

# Usage
text = extract_text_safely("sample.docx")
if text:
    print(f"Extracted {len(text)} characters")

sample.docxThe following sample file is used in this example: sample.docx

`Notes`

Accurate mode is the default and provides the best text quality
The get_text() method returns None if text extraction is not supported
Use parser.features.text to check if text extraction is available before calling get_text()
For better performance with large documents, consider extracting text page by page

`Related pages`

Was this page helpful?

Any additional feedback you'd like to share with us?

Please tell us how we can improve this page.

Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.

Extract text in Accurate mode Leave feedback

On this page

Prerequisites

Extract text from document

Extract text from document page

`Extract text with error handling`

`Notes`

Was this page helpful?

Any additional feedback you'd like to share with us?

Please tell us how we can improve this page.

Thank you for your feedback!

`On this page`