Extract text from EPUB eBooks
To extract a text from EPUB e-books GetText and GetText(pageIndex) methods is used. These methods allow to extract a text from the entire document or a text from the selected page. Raw mode is not supported for EPUB.
Here are the steps to extract a text from EPUB e-book:
- Instantiate Parser object for the initial e-book;
- Call GetText method and obtain TextReader object;
- Read a text from reader.
The following example demonstrates how to extract a text from EPUB e-book:
// Create an instance of Parser class
using(Parser parser = new Parser(filePath))
{
// Extract a text into the reader
using(TextReader reader = parser.GetText())
{
// Print a text from the e-book
Console.WriteLine(reader.ReadToEnd());
}
}
Here are the steps to extract a text from the page of EPUB e-book:
- Instantiate Parser object for the initial e-book;
- Call GetDocumentInfo method and obtain IDocumentInfo object with page count;
- Call GetText(pageIndex) method with the page index and obtain TextReader object;
- Read a text from reader.
The following example demonstrates how to extract a text from the page of EPUB e-book:
// Create an instance of Parser class
using(Parser parser = new Parser(filePath))
{
// Get the document info
IDocumentInfo documentInfo = parser.GetDocumentInfo();
// Iterate over pages
for(int p = 0; p < documentInfo.PageCount; p++)
{
// Print a page number
Console.WriteLine(string.Format("Page {0}/{1}", p + 1, documentInfo.PageCount));
// Extract a text into the reader
using(TextReader reader = parser.GetText(p))
{
// Print a text from the e-book
Console.WriteLine(reader.ReadToEnd());
}
}
}
GroupDocs.Parser also allows to extract a text from EPUB e-books as HTML, Markdown and formatted plain text. For more details, see Extract Formatted Text.
Here are the steps to extract a text from EPUB e-book as HTML:
- Instantiate Parser object for the initial e-book;
- Call GetFormattedText method and obtain TextReader object;
- Read a text from reader.
The following example shows how to extract a text from EPUB e-book as HTML:
// Create an instance of Parser class
using (Parser parser = new Parser(filePath))
{
// Extract a formatted text into the reader
using (TextReader reader = parser.GetFormattedText(new FormattedTextOptions(FormattedTextMode.Html)))
{
// Print a formatted text from the e-book
Console.WriteLine(reader.ReadToEnd());
}
}
More resources
GitHub examples
You may easily run the code above and see the feature in action in our GitHub examples:
Free online document parser App
Along with full featured .NET library we provide simple, but powerful free Apps.
You are welcome to parse documents and extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more with our free online Free Online Document Parser App.