Extract text from Microsoft Office Excel spreadsheets
Extract text from Microsoft Office Excel spreadsheets
Leave feedback
On this page
Overview
This guide demonstrates how to extract text content from Microsoft Office Excel spreadsheets (.xls, .xlsx) using the GroupDocs.Parser for .NET API. You’ll learn different text extraction methods suitable for various document processing scenarios, from simple Excel text retrieval to advanced sheet-by-sheet parsing operations.
Extraction Methods Comparison
Method
Use Case
Performance
Output Quality
Whole Document
Extract all text at once
Fast
Standard
Sheet-by-Sheet
Process individual worksheets
Medium
Standard
Raw Mode
High-speed bulk processing
Fastest
Lower formatting accuracy
Formatted Text
Preserve formatting (HTML/Markdown)
Slower
Highest
Method 1: Extract Text from Entire Spreadsheet
When to use: When you need all text content from the Excel workbook and don’t need to distinguish between different worksheets for your document parsing workflow.
To extract text from Microsoft Office Excel spreadsheets using the .NET parser library, the GetText method is used. This text extraction API method retrieves content from the entire document.
Steps:
Instantiate Parser object for the initial spreadsheet
GetText method returns null value if text extraction isn’t supported for the document. For example, text extraction isn’t supported for Zip archive. Therefore, for Zip archive GetText method returns null. For empty Microsoft Office Excel spreadsheets GetText method returns an empty TextReader object (reader.ReadToEnd method returns an empty string).
Example:
// Create an instance of Parser classusing(Parserparser=newParser(filePath)){// Extract a text into the readerusing(TextReaderreader=parser.GetText()){// Print a text from the spreadsheetConsole.WriteLine(reader.ReadToEnd());}}
Method 2: Extract Text from Individual Sheets
When to use: When you need to process each Excel worksheet separately for your C# spreadsheet parser application, maintain sheet organization, or perform sheet-specific data extraction operations.
This Excel parsing method uses GetText(pageIndex) to extract text from specific sheets. Each worksheet is treated as a separate page in the document parsing process.
Steps:
Instantiate Parser object for the initial spreadsheet
// Create an instance of Parser classusing(Parserparser=newParser(filePath)){// Get the document infoIDocumentInfodocumentInfo=parser.GetDocumentInfo();// Iterate over sheetsfor(intp=0;p<documentInfo.PageCount;p++){// Print a sheet number Console.WriteLine(string.Format("Page {0}/{1}",p+1,documentInfo.PageCount));// Extract a text into the readerusing(TextReaderreader=parser.GetText(p)){// Print a text from the spreadsheet sheetConsole.WriteLine(reader.ReadToEnd());}}}
Method 3: High-Speed Raw Text Extraction
When to use: When processing large Excel files or multiple spreadsheets where parsing speed is more important than formatting accuracy. Ideal for Excel data mining, content indexing, or bulk document processing scenarios in enterprise applications.
Raw mode increases text extraction performance by sacrificing formatting accuracy in the .NET parser. Use GetText(TextOptions) and GetText(pageIndex, TextOptions) methods for high-speed raw mode extraction.
// Create an instance of Parser classusing(Parserparser=newParser(filePath)){// Get the document infoIDocumentInfodocumentInfo=parser.GetDocumentInfo();// Check if the document has pagesif(documentInfo==null||documentInfo.RawPageCount==0){Console.WriteLine("Document hasn't pages.");return;}// Iterate over sheetsfor(intp=0;p<documentInfo.RawPageCount;p++){// Print a sheet number Console.WriteLine(string.Format("Page {0}/{1}",p+1,documentInfo.RawPageCount));// Extract a text into the readerusing(TextReaderreader=parser.GetText(p,newTextOptions(true))){// Print a text from the spreadsheet sheetConsole.WriteLine(reader.ReadToEnd());}}}
Method 4: Extract Formatted Text (HTML)
When to use: When you need to preserve the visual structure and formatting of Excel spreadsheet data using the .NET document parser, or when integrating extracted content into web applications or formatted reports.
The GroupDocs.Parser text extraction library allows extracting text from Microsoft Office Excel spreadsheets as HTML, Markdown, and formatted plain text. For more details, see Extract Formatted Text.
Steps:
Instantiate Parser object for the initial spreadsheet
// Create an instance of Parser classusing(Parserparser=newParser(filePath)){// Extract a formatted text into the readerusing(TextReaderreader=parser.GetFormattedText(newFormattedTextOptions(FormattedTextMode.Html))){// Print a formatted text from the sheetConsole.WriteLine(reader.ReadToEnd());}}
Supported Excel Formats
.xls - Microsoft Excel 97-2003 Workbook
.xlsx - Microsoft Excel Open XML Workbook
Common Use Cases
Enterprise Document Processing
Excel Data Mining: Extract text for search indexing and content analysis using C# parser methods
Spreadsheet Report Generation: Convert Excel workbook data to text-based reports
Content Migration: Move spreadsheet data between different document management systems
Business Intelligence & Analytics
Text Analytics on Excel Files: Analyze comments, notes, and text data within .xls/.xlsx spreadsheets
Compliance Document Processing: Extract text content for regulatory reporting from Excel documents
Audit Trails: Document text content changes over time in spreadsheet parser workflows
Integration Scenarios
Enterprise Search Systems: Index Excel spreadsheet content for full-text search capabilities
Automated Data Pipelines: Extract text as part of automated document processing workflows
Content Management Systems: Process uploaded Excel files automatically using the .NET parsing API
Performance Considerations
File Size Impact
Small files (< 1MB): All methods perform similarly
Medium files (1-10MB): Raw mode provides noticeable speed improvement
Large files (> 10MB): Raw mode recommended for bulk processing
Memory Usage
Sheet-by-sheet processing uses less memory than whole document extraction
Raw mode is more memory-efficient for large files
Consider processing sheets individually for very large workbooks
Troubleshooting
Common Issues
Null TextReader Response
Verify the file format is supported (.xls, .xlsx)
Check if the file is corrupted or password-protected
Ensure the file path is correct and accessible
Empty Text Output
Confirm the spreadsheet contains text data (not just numbers/formulas)
Check if the sheets contain visible content
Verify the file isn’t completely empty
Performance Issues
Use raw mode for large files or bulk processing
Process sheets individually instead of extracting the entire document
Consider file size limitations in your environment
More resources
GitHub examples
You may easily run the code above and see the feature in action in our GitHub examples:
Along with full featured .NET library we provide simple, but powerful free Apps.
You are welcome to parse documents and extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.
On this page
Analyzing your prompt, please hold on...
An error occurred while retrieving the results. Please refresh the page and try again.