Batch Word Document Comparison with Performance Optimization in Node.js
Batch Word Document Comparison with Performance Optimization in Node.js
Leave feedback
On this page
Note
đź’ˇFor the complete working code and detailed explanations, please refer to the full repository here. This repository contains all source files, helper classes, and configuration examples.
Overview
This use‑case demonstrates how to perform high‑throughput batch comparison of Word documents (DOCX/DOC) with GroupDocs.Comparison for Node.js via Java. It shows how to:
Discover matching source‑target file pairs.
Compare documents sequentially or in parallel.
Apply performance‑tuned CompareOptions.
Monitor progress and generate a summary report.
The approach is ideal for scenarios such as document version control, compliance auditing, or large‑scale content migration where thousands of document pairs must be examined quickly and reliably.
A temporary or permanent GroupDocs.Comparison license (place the key in src/utils/licenseHelper.js).
Installation
# Clone the sample repository (or copy the relevant files into your project)git clone https://github.com/groupdocs-comparison/batch-document-comparison-performance
cd batch-document-comparison-performance/src
# Install Node.js dependenciesnpm install
Note
đź’ˇ The package.json declares the required Node version (>=20.0.0) and the GroupDocs.Comparison dependency (^25.11.0).
The following steps walk you through a parallel batch comparison – the fastest strategy for large workloads.
Prepare the input directories
Place matching Word files in sample-files/source/ and sample-files/target/.
Filenames must be identical (e.g., contract_v1.docx in both folders).
Run the parallel example
# Windows (cmd) – the script sets the required Java optionnpm run example:parallel
The script executes src/examples/parallelBatchComparison.js, which:
Calls findWordPairs to locate matching pairs.
Processes the pairs in batches (default concurrency = 5).
Emits a progress line after each batch.
Writes comparison results to output/ as comparison_<basename>.docx.
Inspect the generated report
After the run completes, a summary is printed to the console. It includes total documents, success rate, total duration, average time per document, and throughput.
================================================================================
Batch Comparison Summary
================================================================================
Total Documents: 120
Successful: 118
Failed: 2
Success Rate: 98.33%
Performance Metrics:
Total Duration: 42.67s
Average Duration: 360.42ms per document
Throughput: 2.76 documents/second
Concurrency: 5
================================================================================
(Optional) Run the optimized demo
If you need the fastest possible run, use the tuned example that disables unnecessary comparison features:
npm run example:optimized
This script supplies a CompareOptions object that:
Disables style detection when not required.
Generates a lightweight summary page only.
Sets sensitivity to Medium for a good speed‑accuracy trade‑off.
Warning
⚠️ Memory‑Intensive Workloads – If you encounter Java heap space or Node out of memory errors, reduce the concurrency value in parallelBatchComparison.js or launch Node with a larger heap (node --max-old-space-size=4096).
Notes
License – The sample uses a temporary license. Replace the placeholder in src/utils/licenseHelper.js with your permanent license string for production use.
Java Options – The npm scripts prepend JAVA_TOOL_OPTIONS=--enable-native-access=ALL-UNNAMED. Adjust or remove this flag if your Java version does not require it.
Error Handling – Individual pair failures are logged but do not abort the batch. The summary report lists both successes and failures.
Performance Tuning – Experiment with the concurrency variable and CompareOptions to find the optimal balance for your hardware and document sizes.