Extract images from document page
GroupDocs.Parser provides the functionality to extract images from document page by the GetImages(int) method:
IEnumerable<PageImageArea> GetImages(int pageIndex);
The methods return a collection of PageImageArea objects:
Member | Description |
---|---|
Page | The page that contains the text area. |
Rectangle | The rectangular area on the page that contains the text area. |
FileType | The format of the image. |
Rotation | The rotation angle of the image. |
Stream GetImageStream() | Returns the image stream. |
Stream GetImageStream(ImageOptions) | Returns the image stream in a different format. |
Save(string) | Saves the image to the file. |
Save(string, ImageOptions) | Saves the image to the file in a different format. |
ImageOptions class is used to define the image format into which the image is converted. The following image formats are supported:
- Bmp
- Gif
- Jpeg
- Png
- WebP
Here are the steps to extract images from the document page:
- Instantiate Parser object for the initial document;
- Call Features.Images property to check if images extraction is supported for the document;
- Call GetImages(int) method with the page index and obtain collection of PageImageArea objects;
- Iterate through the collection and get sizes, image types and image contents.
The following example shows how to extract images from a document page:
// Create an instance of Parser class
using (Parser parser = new Parser(filePath))
{
// Check if the document supports images extraction
if (!parser.Features.Images)
{
Console.WriteLine("Document isn't supports images extraction.");
return;
}
// Get the document info
IDocumentInfo documentInfo = parser.GetDocumentInfo();
// Check if the document has pages
if (documentInfo.PageCount == 0)
{
Console.WriteLine("Document hasn't pages.");
return;
}
// Iterate over pages
for (int pageIndex = 0; pageIndex < documentInfo.PageCount; pageIndex++)
{
// Print a page number
Console.WriteLine(string.Format("Page {0}/{1}", pageIndex + 1, documentInfo.PageCount));
// Iterate over images
// We ignore null-checking as we have checked images extraction feature support earlier
foreach (PageImageArea image in parser.GetImages(pageIndex))
{
// Print a rectangle and image type
Console.WriteLine(string.Format("R: {0}, Text: {1}", image.Rectangle, image.FileType));
}
}
}
More resources
GitHub examples
You may easily run the code above and see the feature in action in our GitHub examples:
Free online image extractor App
Along with full featured .NET library we provide simple, but powerfull free APPs.
You are welcome to extract images from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online GroupDocs Parser App.