Here are the key reasons to use the new updated API provided by GroupDocs.Parser for Java since version 19.11:
Parser class is introduced as a single entry point to extract data from the document.
Data extraction was unified for all data types.
The overall document related classes were unified to common.
Product architecture was redesigned from scratch in order to simplify passing options and classes to manipulate data.
Document information and preview generation procedures were simplified.
How To Migrate?
Here is brief comparison of how to extract data using the old and new API.
Text
Old coding style
// Create an extractor factory
ExtractorFactoryfactory=newExtractorFactory();// Create a text extractor
try(TextExtractorextractor=factory.createTextExtractor(filePath)){// Extract a text from the text extractor
StringtextLine=null;do{textLine=extractor.extractLine();if(textLine!=null){System.out.println(textLine);}}while(textLine!=null);}
New coding style
// Create an instance of Parser class
try(Parserparser=newParser(filePath)){// Extract a text to the reader
try(TextReaderreader=parser.getText()){// Check if text extraction is supported
if(reader==null){System.out.println("Text extraction isn't supported.");return;}// Extract a text from the reader
StringtextLine=null;do{textLine=reader.readLine();if(textLine!=null){System.out.println(textLine);}}while(textLine!=null);}}
Text Page
Old coding style
// Create an extractor factory
ExtractorFactoryfactory=newExtractorFactory();// Create a text extractor
try(TextExtractorextractor=factory.createTextExtractor(filePath)){// Check if the extractor supports pagination
IPageTextExtractorpte=extractorinstanceofIPageTextExtractor?(IPageTextExtractor)extractor:null;if(pte!=null){// Extract the first page
System.out.println(pte.extractPage(0));}}
New coding style
// Create an instance of Parser class
try(Parserparser=newParser(filePath)){// Extract the first page text to the reader
try(TextReaderreader=parser.getText(0)){// Check if text extraction is supported
if(reader!=null){// Extract a text from the reader
System.out.println(reader.readToEnd());}}}
Search
Old coding style
// Create an extractor factory
ExtractorFactoryfactory=newExtractorFactory();// Create a text extractor
try(TextExtractorextractor=factory.createTextExtractor(filePath)){// Check if the extractor supports search
ISearchablese=extractorinstanceofISearchable?(ISearchable)extractor:null;if(se!=null){// Create a handler
ListSearchHandlerhandler=newListSearchHandler();// Search "keyword" in the document
se.search(newSearchOptions(null),handler,java.util.Arrays.asList(newString[]{"keyword"}));// Print search results
for(SearchResultresult:handler.getList()){System.out.println(String.format("at %d: %s",result.getIndex(),result.getFoundText()));}}}
New coding style
// Create an instance of Parser class
try(Parserparser=newParser(filePath)){// Search "keyword" in the document
Iterable<SearchResult>list=parser.search("keyword");// Check if search is supported
if(list==null){System.out.println("Search isn't supported.");return;}// Print search results
for(SearchResultresult:list){System.out.println(String.format("at %d: %s",result.getPosition(),result.getText()));}}
File Type Detection
Old coding style
// Detect and print file type
System.out.println(CompositeMediaTypeDetector.DEFAULT.detect(filePath));
New coding style
// Create an instance of Parser class
try(Parserparser=newParser(filePath)){// Detect and print file type
System.out.println(parser.getDocumentInfo().getFileType());}
Metadata
Old coding style
// Create an extractor factory
ExtractorFactoryfactory=newExtractorFactory();// Create a metadata extractor
MetadataExtractorextractor=factory.createMetadataExtractor(filePath);// Extract metadata
MetadataCollectionmetadata=extractor.extractMetadata(filePath);// Print metadata
for(Stringkey:metadata.getKeys()){Stringvalue=metadata.get_Item(key);System.out.println(String.format("%s = %s",key,value));}
New coding style
// Create an instance of Parser class
try(Parserparser=newParser(filePath)){// Extract metadata
Iterable<MetadataItem>metadata=parser.getMetadata();// Check if metadata extraction is supported
if(metadata==null){System.out.println("Metadata extraction isn't supported.");return;}// Print metadata
for(MetadataItemitem:metadata){System.out.println(String.format("%s = %s",item.getName(),item.getValue()));}}
Structure
Old coding style
// Create an extractor factory
ExtractorFactoryfactory=newExtractorFactory();// Create a text extractor
try(TextExtractorextractor=factory.createTextExtractor(filePath)){// Check if the extractor supports text structure extraction
IStructuredExtractorse=extractorinstanceofIStructuredExtractor?(IStructuredExtractor)extractor:null;if(se!=null){// Create a handler
Handlerhandler=newHandler();// Extract text structure
se.extractStructured(handler);// Print hyperlinks
for(Stringlink:handler.getLinks()){System.out.println(link);}}}// Handler for the hyperlink extraction
classHandlerextendsStructuredHandler{privatefinaljava.util.List<String>links;publicHandler(){links=newjava.util.ArrayList<String>();}publicjava.util.List<String>getLinks(){returnlinks;}// Override the method to catch hyperlinks
@OverrideprotectedvoidonStartHyperlink(HyperlinkPropertiesproperties){links.add(properties.getLink());}}
New coding style
// Create an instance of Parser class
try(Parserparser=newParser(filePath)){// Extract text structure to the XML reader
Documentdocument=parser.getStructure();// Check if text structure extraction is supported
if(document==null){System.out.println("Text structure extraction isn't supported.");return;}// Read XML document
readNode(document.getDocumentElement());}voidreadNode(Nodenode){NodeListnodes=node.getChildNodes();for(inti=0;i<nodes.getLength();i++){Noden=nodes.item(i);if(n.getNodeName().toLowerCase()=="hyperlink"){Nodea=n.getAttributes().getNamedItem("link");if(a!=null){System.out.println(a.getNodeValue());}}if(n.hasChildNodes()){readNode(n);}}}
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.