Load file from stream

To avoid saving a file on disk, GroupDocs.Parser allows you to work with file streams directly.

To load a document from a stream, follow these steps:

  1. Open a file stream
  2. Pass the opened file stream to the Parser class constructor

The following code snippet shows how to load a file from a stream:

Load document from file stream

from groupdocs.parser import Parser

# Open the file as a binary stream
with open("sample.docx", "rb") as stream:
    # Create an instance of Parser class with the stream
    with Parser(stream) as parser:
        # Extract text from the document
        text_reader = parser.get_text()
        
        if text_reader is not None:
            # Print the extracted text
            print(text_reader)
        else:
            print("Text extraction isn't supported for this format")

The following sample file is used in this example: sample.docx

Load document from BytesIO stream

from groupdocs.parser import Parser
from io import BytesIO

# Read file into memory
with open("sample.pdf", "rb") as file:
    file_data = file.read()

# Create a BytesIO stream
stream = BytesIO(file_data)

# Create an instance of Parser class with the stream
with Parser(stream) as parser:
    # Get document info
    doc_info = parser.get_document_info()
    
    print(f"File type: {doc_info.file_type.file_format}")
    print(f"Page count: {doc_info.page_count}")

The following sample file is used in this example: sample.pdf

Process uploaded file without saving

This is useful for web applications where files are uploaded by users:

from groupdocs.parser import Parser
from io import BytesIO

def process_uploaded_file(file_content, filename):
    """Process an uploaded file without saving it to disk"""
    
    try:
        # Create a stream from the file content
        stream = BytesIO(file_content)
        
        # Create parser instance
        with Parser(stream) as parser:
            print(f"Processing uploaded file: {filename}")
            
            # Get document info
            doc_info = parser.get_document_info()
            print(f"File type: {doc_info.file_type.file_format}")
            print(f"Pages: {doc_info.page_count}")
            
            # Extract text
            text_reader = parser.get_text()
            if text_reader:
                text = text_reader
                
                return {
                    "success": True,
                    "filename": filename,
                    "type": doc_info.file_type.file_format,
                    "pages": doc_info.page_count,
                    "text_length": len(text),
                    "text": text
                }
            else:
                return {
                    "success": False,
                    "error": "Text extraction not supported"
                }
                
    except Exception as e:
        return {
            "success": False,
            "error": str(e)
        }

# Example: Simulate uploaded file
with open("sample.docx", "rb") as f:
    file_content = f.read()

result = process_uploaded_file(file_content, "sample.docx")
print(result)

The following sample file is used in this example: sample.docx

More resources

GitHub examples

You may find more code examples in our GitHub repository:

Free online document parser

Along with the full-featured library, we provide a free online document parser app. You are welcome to extract data from PDF, DOCX, XLSX, and more with our Free Online Document Parser App.