Extract text from Microsoft Office PowerPoint presentations Leave feedback

To extract a text from Microsoft Office PowerPoint presentations getText and getText(int) method is used. These methods allow to extract a text from the entire presentation or a text from the selected slide.

Here are the steps to extract a text from Microsoft Office PowerPoint presentations:

Instantiate Parser object for the initial presentation;
Call getText method and obtain TextReader object;
Read a text from reader.

Warning
getText method returns null value if text extraction isn’t supported for the document. For example, text extraction isn’t supported for Zip archive. Therefore, for Zip archive getText method returns null. For empty Microsoft Office PowerPoint presentation getText method returns an empty TextReader object (readToEnd method returns an empty string).

The following example demonstrates how to extract a text from Microsoft Office PowerPoint presentation:

// Create an instance of Parser class
try (Parser parser = new Parser(Constants.SamplePptx)) {
    // Extract a text into the reader
    try (TextReader reader = parser.getText()) {
        // Print a text from the presentation
        System.out.println(reader.readToEnd());
    }
}

Here are the steps to extract a text from the slide of Microsoft Office PowerPoint presentation:

Instantiate Parser object for the initial presentation;
Call getDocumentInfo method and obtain IDocumentInfo object with getPageCount property;
Call getText(int) method with the slide index and obtain TextReader object;
Read a text from reader.

The following example demonstrates how to extract a text from the slide of Microsoft Office PowerPoint presentation:

// Create an instance of Parser class
try (Parser parser = new Parser(Constants.SamplePptx)) {
    // Get the presentation info
    IDocumentInfo presentationInfo = parser.getDocumentInfo();
    // Iterate over slides
    for (int p = 0; p < presentationInfo.getPageCount(); p++) {
        // Print a slide number
        System.out.println(String.format("Slide %d/%d", p + 1, presentationInfo.getPageCount()));
        // Extract a text into the reader
        try (TextReader reader = parser.getText(p)) {
            // Print a text from the presentation
            System.out.println(reader.readToEnd());
        }
    }
}

Raw mode allows to increase the speed of text extraction due to poor formatting accuracy. getText(TextOptions) and getText(int, TextOptions) methods are used to extract a text in raw mode.

Warning
Raw mode is not supported for password-protected presentations.

Warning
Some presentations may have different slide numbers in raw and accurate modes. Use getRawPageCount instead of getPageCount in raw mode.

Here are the steps to extract a raw text from the slide of Microsoft Office PowerPoint presentation:

Instantiate Parser object for the initial presentation;
Instantiate TextOptions object with true parameter;
Call getDocumentInfo method;
Use getRawPageCount instead of getPageCount to avoid extra calculations;
Call getText(int, TextOptions) method with the slide index and obtain TextReader object;
Read a text from reader.

The following example demonstrates how to extract a raw text from the slide of Microsoft Office PowerPoint presentation:

// Create an instance of Parser class
try (Parser parser = new Parser(Constants.SamplePptx)) {
    // Check if the document supports text extraction
    if (!parser.getFeatures().isText()) {
        System.out.println("Document isn't supports text extraction.");
        return;
    }
    // Get the document info
    DocumentInfo documentInfo = parser.getDocumentInfo() instanceof DocumentInfo
            ? (DocumentInfo) parser.getDocumentInfo()
            : null;
    // Check if the document has pages
    if (documentInfo == null || documentInfo.getRawPageCount() == 0) {
        System.out.println("Document hasn't pages.");
        return;
    }
    // Iterate over pages
    for (int p = 0; p < documentInfo.getRawPageCount(); p++) {
        // Print a page number
        System.out.println(String.format("Page %d/%d", p + 1, documentInfo.getPageCount()));
        // Extract a text into the reader
        try (TextReader reader = parser.getText(p, new TextOptions(true))) {
            // Print a text from the document
            // We ignore null-checking as we have checked text extraction feature support earlier
            System.out.println(reader.readToEnd());
        }
    }
}

GroupDocs.Parser also allows to extract a text from Microsoft Office PowerPoint presentations as HTML, Markdown and formatted plain text. For more details, see Extract Formatted Text.

Here are the steps to extract a text from Microsoft Office PowerPoint presentation as HTML:

Instantiate Parser object for the initial presentation;
Call getFormattedText(FormattedTextOptions) method and obtain TextReader object;
Read a text from reader.

The following example shows how to extract a text from Microsoft Office PowerPoint presentation as HTML:

// Create an instance of Parser class
try (Parser parser = new Parser(Constants.SamplePptx)) {
    // Extract a formatted text into the reader
    try (TextReader reader = parser.getFormattedText(new FormattedTextOptions(FormattedTextMode.Html))) {
        // Print a formatted text from the presentation
        System.out.println(reader.readToEnd());
    }
}

More resources

GitHub examples

You may easily run the code above and see the feature in action in our GitHub examples:

Free online document parser App

Along with full featured .NET library we provide simple, but powerful free Apps.

You are welcome to parse documents and extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.

We value your opinion. Your feedback will help us improve our documentation.

Extract text from Microsoft Office PowerPoint presentations Leave feedback

More resources

GitHub examples

Free online document parser App

Was this page helpful?

Any additional feedback you'd like to share with us?

Please tell us how we can improve this page.

Thank you for your feedback!