Text area represents a rectangular page area with a text. Text area can be simple or composite. The simple text area contains only a text and Areas property is always an empty collection (not null). The composite text area doesn’t have its own text. Text property is calculated by its children texts which are contained in Areas property.
Extract text areas
Here are the steps to extract text areas from the whole document:
Instantiate Parser object for the initial document;
Check if collection isn’t null (text areas extraction is supported for the document);
Iterate through the collection and get rectangles and text.
The following example shows how to extract all text areas from the whole document:
// Create an instance of Parser classusing(Parserparser=newParser(filePath)){// Extract text areasIEnumerable<PageTextArea>areas=parser.GetTextAreas();// Check if text areas extraction is supportedif(areas==null){Console.WriteLine("Page text areas extraction isn't supported");return;}// Iterate over page text areasforeach(PageTextAreaainareas){// Print a page index, rectangle and text area value:Console.WriteLine(string.Format("Page: {0}, R: {1}, Text: {2}",a.Page.Index,a.Rectangle,a.Text));}}
Extract text areas from a document page
Here are the steps to extract text areas from a document page:
Instantiate Parser object for the initial document;
Call Features.TextAreas property to check if text areas extraction is supported for the document;
Check if collection isn’t null (text areas extraction is supported for the document);
Iterate through the collection and get rectangles and text.
The following example shows how to extract text areas from a document page:
// Create an instance of Parser classusing(Parserparser=newParser(filePath)){// Check if the document supports text areas extractionif(!parser.Features.TextAreas){Console.WriteLine("Document isn't supports text areas extraction.");return;}// Get the document infoIDocumentInfodocumentInfo=parser.GetDocumentInfo();// Check if the document has pagesif(documentInfo.PageCount==0){Console.WriteLine("Document hasn't pages.");return;}// Iterate over pagesfor(intpageIndex=0;pageIndex<documentInfo.PageCount;pageIndex++){// Print a page number Console.WriteLine(string.Format("Page {0}/{1}",pageIndex+1,documentInfo.PageCount));// Iterate over page text areas// We ignore null-checking as we have checked text areas extraction feature support earlierforeach(PageTextAreaainparser.GetTextAreas(pageIndex)){// Print a rectangle and text area value:Console.WriteLine(string.Format("R: {0}, Text: {1}",a.Rectangle,a.Text));}}}
Extract text areas with options
PageTextAreaOptions parameter is used to customize text areas extraction process. This class has the following members:
Check if collection isn’t null (text areas extraction is supported for the document);
Iterate through the collection and get rectangles and text.
The following example shows how to extract only text areas with digits from the upper-left corner:
// Create an instance of Parser classusing(Parserparser=newParser(filePath)){// Create the options which are used for text area extractionPageTextAreaOptionsoptions=newPageTextAreaOptions("[0-9]+",newRectangle(newPoint(0,0),newSize(300,100)));// Extract text areas which contain only digits from the upper-left corner of a page:IEnumerable<PageTextArea>areas=parser.GetTextAreas(options);// Check if text areas extraction is supportedif(areas==null){Console.WriteLine("Page text areas extraction isn't supported");return;}// Iterate over page text areasforeach(PageTextAreaainareas){// Print a page index, rectangle and text area value:Console.WriteLine(string.Format("Page: {0}, R: {1}, Text: {2}",a.Page.Index,a.Rectangle,a.Text));}}
More resources
GitHub examples
You may easily run the code above and see the feature in action in our GitHub examples:
Along with full featured .NET library we provide simple, but powerful free Apps.
You are welcome to parse documents and extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.