Extract text from Microsoft Office Excel spreadsheets Leave feedback

Overview

This guide demonstrates how to extract text content from Microsoft Office Excel spreadsheets (.xls, .xlsx) using the GroupDocs.Parser for .NET API. You’ll learn different text extraction methods suitable for various document processing scenarios, from simple Excel text retrieval to advanced sheet-by-sheet parsing operations.

Extraction Methods Comparison

Method	Use Case	Performance	Output Quality
Whole Document	Extract all text at once	Fast	Standard
Sheet-by-Sheet	Process individual worksheets	Medium	Standard
Raw Mode	High-speed bulk processing	Fastest	Lower formatting accuracy
Formatted Text	Preserve formatting (HTML/Markdown)	Slower	Highest

Method 1: Extract Text from Entire Spreadsheet

When to use: When you need all text content from the Excel workbook and don’t need to distinguish between different worksheets for your document parsing workflow.

To extract text from Microsoft Office Excel spreadsheets using the .NET parser library, the GetText method is used. This text extraction API method retrieves content from the entire document.

Steps:

Instantiate Parser object for the initial spreadsheet
Call GetText method and obtain TextReader object
Read text from the reader

Warning
GetText method returns null value if text extraction isn’t supported for the document. For example, text extraction isn’t supported for Zip archive. Therefore, for Zip archive GetText method returns null. For empty Microsoft Office Excel spreadsheets GetText method returns an empty TextReader object (reader.ReadToEnd method returns an empty string).

Example:

// Create an instance of Parser class
using(Parser parser = new Parser(filePath))
{
    // Extract a text into the reader
    using(TextReader reader = parser.GetText())
    {
        // Print a text from the spreadsheet
        Console.WriteLine(reader.ReadToEnd());
    }
}

Method 2: Extract Text from Individual Sheets

When to use: When you need to process each Excel worksheet separately for your C# spreadsheet parser application, maintain sheet organization, or perform sheet-specific data extraction operations.

This Excel parsing method uses GetText(pageIndex) to extract text from specific sheets. Each worksheet is treated as a separate page in the document parsing process.

Steps:

Instantiate Parser object for the initial spreadsheet
Call GetDocumentInfo method and obtain IDocumentInfo object with page count
Call GetText(pageIndex) method with the sheet index and obtain TextReader object
Read text from the reader

Example:

// Create an instance of Parser class
using(Parser parser = new Parser(filePath))
{
    // Get the document info
    IDocumentInfo documentInfo = parser.GetDocumentInfo();
   
    // Iterate over sheets
    for(int p = 0; p < documentInfo.PageCount; p++)
    {
        // Print a sheet number 
        Console.WriteLine(string.Format("Page {0}/{1}", p + 1, documentInfo.PageCount));
   
        // Extract a text into the reader
        using(TextReader reader = parser.GetText(p))
        {
            // Print a text from the spreadsheet sheet
            Console.WriteLine(reader.ReadToEnd());
        }
    }
}

Method 3: High-Speed Raw Text Extraction

When to use: When processing large Excel files or multiple spreadsheets where parsing speed is more important than formatting accuracy. Ideal for Excel data mining, content indexing, or bulk document processing scenarios in enterprise applications.

Raw mode increases text extraction performance by sacrificing formatting accuracy in the .NET parser. Use GetText(TextOptions) and GetText(pageIndex, TextOptions) methods for high-speed raw mode extraction.

Warning
Some spreadsheets may have different sheet numbers in raw and accurate modes. Use IDocumentInfo.RawPageCount instead of IDocumentInfo.PageCount in raw mode.

Steps:

Instantiate Parser object for the initial spreadsheet
Instantiate TextOptions object with true parameter
Call GetDocumentInfo method
Use RawPageCount instead of PageCount to avoid extra calculations
Call GetText(pageIndex, TextOptions) method with the sheet index and obtain TextReader object
Read text from the reader

Example:

// Create an instance of Parser class
using (Parser parser = new Parser(filePath))
{
    // Get the document info
    IDocumentInfo documentInfo = parser.GetDocumentInfo();
    // Check if the document has pages
    if (documentInfo == null || documentInfo.RawPageCount == 0)
    {
        Console.WriteLine("Document hasn't pages.");
        return;
    }
    // Iterate over sheets
    for (int p = 0; p < documentInfo.RawPageCount; p++)
    {
        // Print a sheet number 
        Console.WriteLine(string.Format("Page {0}/{1}", p + 1, documentInfo.RawPageCount));
        // Extract a text into the reader
        using (TextReader reader = parser.GetText(p, new TextOptions(true)))
        {
            // Print a text from the spreadsheet sheet
            Console.WriteLine(reader.ReadToEnd());
        }
    }
}

Method 4: Extract Formatted Text (HTML)

When to use: When you need to preserve the visual structure and formatting of Excel spreadsheet data using the .NET document parser, or when integrating extracted content into web applications or formatted reports.

The GroupDocs.Parser text extraction library allows extracting text from Microsoft Office Excel spreadsheets as HTML, Markdown, and formatted plain text. For more details, see Extract Formatted Text.

Steps:

Instantiate Parser object for the initial spreadsheet
Call GetFormattedText method and obtain TextReader object
Read text from the reader

Example:

// Create an instance of Parser class
using (Parser parser = new Parser(filePath))
{
    // Extract a formatted text into the reader
    using (TextReader reader = parser.GetFormattedText(new FormattedTextOptions(FormattedTextMode.Html)))
    {
        // Print a formatted text from the sheet
        Console.WriteLine(reader.ReadToEnd());
    }
}

Supported Excel Formats

.xls - Microsoft Excel 97-2003 Workbook
.xlsx - Microsoft Excel Open XML Workbook

Common Use Cases

Enterprise Document Processing

Excel Data Mining: Extract text for search indexing and content analysis using C# parser methods
Spreadsheet Report Generation: Convert Excel workbook data to text-based reports
Content Migration: Move spreadsheet data between different document management systems

Business Intelligence & Analytics

Text Analytics on Excel Files: Analyze comments, notes, and text data within .xls/.xlsx spreadsheets
Compliance Document Processing: Extract text content for regulatory reporting from Excel documents
Audit Trails: Document text content changes over time in spreadsheet parser workflows

Integration Scenarios

Enterprise Search Systems: Index Excel spreadsheet content for full-text search capabilities
Automated Data Pipelines: Extract text as part of automated document processing workflows
Content Management Systems: Process uploaded Excel files automatically using the .NET parsing API

Performance Considerations

File Size Impact

Small files (< 1MB): All methods perform similarly
Medium files (1-10MB): Raw mode provides noticeable speed improvement
Large files (> 10MB): Raw mode recommended for bulk processing

Memory Usage

Sheet-by-sheet processing uses less memory than whole document extraction
Raw mode is more memory-efficient for large files
Consider processing sheets individually for very large workbooks

Troubleshooting

Common Issues

Null TextReader Response

Verify the file format is supported (.xls, .xlsx)
Check if the file is corrupted or password-protected
Ensure the file path is correct and accessible

Empty Text Output

Confirm the spreadsheet contains text data (not just numbers/formulas)
Check if the sheets contain visible content
Verify the file isn’t completely empty

Performance Issues

Use raw mode for large files or bulk processing
Process sheets individually instead of extracting the entire document
Consider file size limitations in your environment

More resources

GitHub examples

You may easily run the code above and see the feature in action in our GitHub examples:

Free online document parser App

Along with full featured .NET library we provide simple, but powerful free Apps.

You are welcome to parse documents and extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.

We value your opinion. Your feedback will help us improve our documentation.

Extract text from Microsoft Office Excel spreadsheets Leave feedback

On this page

Overview

Extraction Methods Comparison

Method 1: Extract Text from Entire Spreadsheet

Steps:

Method 2: Extract Text from Individual Sheets

Steps:

Method 3: High-Speed Raw Text Extraction

Steps:

Method 4: Extract Formatted Text (HTML)

Steps:

Supported Excel Formats

Common Use Cases

Enterprise Document Processing

Business Intelligence & Analytics

Integration Scenarios

Performance Considerations

File Size Impact

Memory Usage

Troubleshooting

Common Issues

More resources

GitHub examples

Free online document parser App

Was this page helpful?

Any additional feedback you'd like to share with us?

Please tell us how we can improve this page.

Thank you for your feedback!

On this page