Extract Tables from Microsoft Office Word Documents
Extract Tables from Microsoft Office Word Documents
Leave feedback
On this page
This guide explains how to extract tables from Microsoft Office Word documents (.doc, .docx) using GroupDocs.Parser for .NET.
How Table Extraction Works
GroupDocs.Parser converts a document’s layout into a structured XML format. Tables are represented by <table> tags within this XML. The extraction process involves reading this XML structure to identify and process table content.
Warning
GetStructure method returns null value if text structure extraction isn’t supported for the document. For example, text structure extraction isn’t supported for TXT files. Therefore, for TXT file GetStructure method returns null. If Microsoft Office Word document has no text, GetStructuremethod returns an empty XmlReader object.
Step-by-Step Guide
1. Create a Parser Instance
Begin by initializing the Parser with your document path.
Get the XML representation of the document’s structure.
using(XmlReaderreader=parser.GetStructure()){if(reader==null){Console.WriteLine("Text structure extraction isn't supported for this document.");return;}// Process the XML structure}
3. Find and Process Tables
Iterate through the XML to locate and process table elements.
while(reader.Read()){// Look for table start elementsif(reader.NodeType==XmlNodeType.Element&&reader.Name=="table"){Console.WriteLine("Found a table:");ProcessTable(reader);}}
4. Table Processing Method
This method handles the extraction of rows and cells from each table.
privatestaticvoidProcessTable(XmlReaderreader){StringBuildercellValue=newStringBuilder();introwIndex=0;while(reader.Read()){// Exit when table endsif(reader.NodeType==XmlNodeType.EndElement&&reader.Name=="table"){break;}// Detect new rowsif(reader.NodeType==XmlNodeType.Element&&reader.Name=="tr"){rowIndex++;Console.WriteLine($"[Row {rowIndex}]");}// Reset cell value at cell startif(reader.NodeType==XmlNodeType.Element&&reader.Name=="td"){cellValue.Clear();}// Output cell value at cell endif(reader.NodeType==XmlNodeType.EndElement&&reader.Name=="td"){Console.WriteLine($" {cellValue}");}// Accumulate cell text contentif(reader.NodeType==XmlNodeType.Text){cellValue.Append(reader.Value);}}}
Complete Example
Here’s a complete working example that extracts and displays tables from a Word document:
usingSystem;usingSystem.Text;usingSystem.Xml;usingGroupDocs.Parser;publicstaticclassWordTableExtractor{publicstaticvoidExtractTables(stringfilePath){using(Parserparser=newParser(filePath)){using(XmlReaderreader=parser.GetStructure()){if(reader==null){Console.WriteLine("Text structure extraction isn't supported for this document.");return;}Console.WriteLine($"Analyzing: {filePath}");Console.WriteLine("----------------------------------------");while(reader.Read()){if(reader.NodeType==XmlNodeType.Element&&reader.Name=="table"){Console.WriteLine("\n>>> TABLE FOUND");ProcessTable(reader);}}}}}privatestaticvoidProcessTable(XmlReaderreader){StringBuildercellValue=newStringBuilder();introwIndex=0;while(reader.Read()){if(reader.NodeType==XmlNodeType.EndElement&&reader.Name=="table"){break;}if(reader.NodeType==XmlNodeType.Element&&reader.Name=="tr"){rowIndex++;Console.WriteLine($"[Row {rowIndex}]");}if(reader.NodeType==XmlNodeType.Element&&reader.Name=="td"){cellValue.Clear();}if(reader.NodeType==XmlNodeType.EndElement&&reader.Name=="td"){Console.WriteLine($" {cellValue}");}if(reader.NodeType==XmlNodeType.Text){cellValue.Append(reader.Value);}}}}
More resources
GitHub examples
You may easily run the code above and see the feature in action in our GitHub examples:
Along with full featured .NET library we provide simple, but powerful free Apps.
You are welcome to parse documents and extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.
On this page
Analyzing your prompt, please hold on...
An error occurred while retrieving the results. Please refresh the page and try again.