To avoid saving a file on disk, GroupDocs.Parser allows you to work with file streams directly.
To load a document from a stream, follow these steps:
Open a file stream
Pass the opened file stream to the Parser class constructor
The following code snippet shows how to load a file from a stream:
Load document from file stream
fromgroupdocs.parserimportParser# Open the file as a binary streamwithopen("sample.docx","rb")asstream:# Create an instance of Parser class with the streamwithParser(stream)asparser:# Extract text from the documenttext_reader=parser.get_text()iftext_readerisnotNone:# Print the extracted textprint(text_reader)else:print("Text extraction isn't supported for this format")
The following sample file is used in this example: sample.docx
Load document from BytesIO stream
fromgroupdocs.parserimportParserfromioimportBytesIO# Read file into memorywithopen("sample.pdf","rb")asfile:file_data=file.read()# Create a BytesIO streamstream=BytesIO(file_data)# Create an instance of Parser class with the streamwithParser(stream)asparser:# Get document infodoc_info=parser.get_document_info()print(f"File type: {doc_info.file_type.file_format}")print(f"Page count: {doc_info.page_count}")
The following sample file is used in this example: sample.pdf
Process uploaded file without saving
This is useful for web applications where files are uploaded by users:
fromgroupdocs.parserimportParserfromioimportBytesIOdefprocess_uploaded_file(file_content,filename):"""Process an uploaded file without saving it to disk"""try:# Create a stream from the file contentstream=BytesIO(file_content)# Create parser instancewithParser(stream)asparser:print(f"Processing uploaded file: {filename}")# Get document infodoc_info=parser.get_document_info()print(f"File type: {doc_info.file_type.file_format}")print(f"Pages: {doc_info.page_count}")# Extract texttext_reader=parser.get_text()iftext_reader:text=text_readerreturn{"success":True,"filename":filename,"type":doc_info.file_type.file_format,"pages":doc_info.page_count,"text_length":len(text),"text":text}else:return{"success":False,"error":"Text extraction not supported"}exceptExceptionase:return{"success":False,"error":str(e)}# Example: Simulate uploaded filewithopen("sample.docx","rb")asf:file_content=f.read()result=process_uploaded_file(file_content,"sample.docx")print(result)
The following sample file is used in this example: sample.docx
More resources
GitHub examples
You may find more code examples in our GitHub repository:
Along with the full-featured library, we provide a free online document parser app. You are welcome to extract data from PDF, DOCX, XLSX, and more with our Free Online Document Parser App.
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.