How to get PDF document information and generate preview Leave feedback

Extract text information from PDf

When working with a PDF document programmatically, in addition to interacting with the text via annotations, the ability to access the text as such is very important. The opportunity to scan the text and to split it into pages, paragraphs and lines is an essential tool. Our Java API provides this capability. By uploading a PDF document you can receive its full text page by page and even line by line within seconds! All you need to do is write a few lines of code.

Example

Since version 21.8 the structure PageInfo has been changed. You can now take an advantage of the new functionality by calling the GetDocumentInfo() method of the Document class.

Each page, represented by PageInfo structure, now contains list of TextLineinfo. Every TextLineinfo contains information about text top and left indents, width, height and text itself. In other words, we can say that each page is represented as a sequence of text lines and you can get this information programatically within seconds!

Code example below shows how you can get data from described structures:

Supported formats

The ability to retrieve text information is implemented for most supported formats: word, pdf, excel, visio diagrams, power point presentations, html and email. Text retrieval works for all formats as it is, except for html (.htm, .html etc) and email (.eml, .msg etc). With those formats it works by converting them into the word document (.docx). Therefore, text parameters for these formats corresponds to their word counterparts.

Generate document preview

When annotating a document, it is very important to be able to see how the document would look in printed form. After all, most documents end up on paper. Of course, this can be achieved by standard means - opening the document and sending it to print. Modern operating systems usually show a preview of a document before printing it. But what if it needs to be done programmatically and much faster?

Our Java API makes it possible. You can generate a preview right after annotating. This can be achieved by writing just a few lines of code:

You can learn about more properties and setting that our preview generator provies. It is much more configurable than we have shown above, but due to the article limitations we cannot cover all the details here.

Conclusion

In short, you have learned how extract data from PDF document within Java applications. Further, you have seen how to generate preview of any PDF file. Now, you should be confident to build your own document annotator Java application.

More resources

Advanced Usage Topics

To learn more about document annotating features, please refer to the advanced usage section.

Free Online App

Along with full-featured Java library we provide simple but powerful free Apps. You are welcome to annotate your PDF, DOC or DOCX, XLS or XLSX, PPT or PPTX, PNG and other documents with free to use online GroupDocs Annotation App.

We value your opinion. Your feedback will help us improve our documentation.

How to get PDF document information and generate preview Leave feedback

On this page

Extract text information from PDf

Example

Supported formats

Generate document preview

Conclusion

More resources

Advanced Usage Topics

Free Online App

Was this page helpful?

Any additional feedback you'd like to share with us?

Please tell us how we can improve this page.

Thank you for your feedback!

On this page