How to edit CSV file

This demonstration shows and explains all necessary moments and options regarding processing DSV spreadsheets (Delimiter-separated values)

Introduction

DSV (Delimiter-Separated Values) documents are specific form of text-based spreadsheets with delimiters (separators). Due of their nature the GroupDocs.Editor processes this class of documents separately from usual binary spreadsheets. In counterpart to usual spreadsheets, DSV documents due to their textual nature have only single tab (worksheet) and cannot be encoded. But any non-empty string may be treated as separator, so user always need to specify it explicitly. The most common types of DSV are:

  1. CSV (Comma-Separated value)
  2. TSV (Tab-Separated value)
  3. Semicolon-separated value
  4. Whitespace-separated value
  5. …and any other

GroupDocs.Editor supports DSV with any separator, which can be character or a set of characters (string).

Loading CSV file for edit

Unlike WordProcessing and Spreadsheet documents, DSV documents are loaded into the Editor class without any loading options. They are simple text files by their nature, so it is nothing to adjust:

String inputCsvPath = "C://input//spreadsheet.csv";
Editor editor = new Editor(inputCsvPath);

Editing CSV file

In order to open any DSV document for edit, user must use DelimitedTextEditOptions class, which single constructor has one mandatory parameter — string separator (delimiter), that should not be NULL or empty string. There are also several optional properties. Two properties — getConvertDateTimeData() and getConvertNumericData() — are boolean flags, which indicate how to treat numbers. GroupDocs.Editor can recognize digits within cells and treat them as numbers or date-time values. By default this recognition is disabled, but user can turn it on. Next property — getTreatConsecutiveDelimitersAsOne(), — is also a boolean flag, which determines how the consecutive delimiters should be treated — as several (default - false) or as a single one (true).

Last property — getOptimizeMemoryUsage(), — is also a boolean flag, but with completely different purpose. By default GroupDocs.Editor algorithms are tuned for maximum performance for reducing the latency time; memory consumption has lower priority. However in some rare cases user may need to load and open very huge DSV, several hundreds of MiBs or even close to GiB. In such cases GroupDocs.Editor (Java in fact) may throw an OutOfMemoryException. For coping with such use-cases the getOptimizeMemoryUsage() flag was introduced. By enabling it user switches GroupDocs.Editor to use another processing algorithms, which consume relatively low amount of memory at the cost of lower performance.

Example below demonstrates using the DelimitedTextEditOptions for editing CSV document, where comma character is a delimiter:

DelimitedTextEditOptions editOptions = new DelimitedTextEditOptions(",");
editOptions.setConvertDateTimeData(false);
editOptions.setConvertNumericData(true);
editOptions.setTreatConsecutiveDelimitersAsOne(true);
editOptions.setOptimizeMemoryUsage(true);

EditableDocument document = editor.edit(editOptions);

Save edited CSV file

After being edited, input DSV can be saved back to DSV (not necessary with the same separator) or to any supportable Spreadsheet document. In order to save document to DSV format user must use the DelimitedTextSaveOptions class, which, like the DelimitedTextEditOptions, has one constructor with mandatory string parameter — separator (delimiter), that should not be NULL or empty string.

There are also other properties:

  1. getEncoding(). Allows to specify the encoding of generated DSV. By default, if not specified, is UTF8.
  2. getTrimLeadingBlankRowAndColumn(). Boolean flag, that indicates whether leading blank rows and columns should be trimmed like what MS Excel does.
  3. getKeepSeparatorsForBlankRow(). Boolean flag, that indicates whether separators should be output for blank row. Default value is false which means the content for blank row will be empty.

Example below demonstrates loading CSV, opening it to the EditableDocument instance, and saving to TSV and XLSM.

String inputCsvPath = "C://input//spreadsheet.csv";
Editor editor = new Editor(inputCsvPath);

DelimitedTextEditOptions editOptions = new DelimitedTextEditOptions(",");
EditableDocument document = editor.edit(editOptions);

DelimitedTextSaveOptions tsvSaveOptions = new DelimitedTextSaveOptions("\t");
tsvSaveOptions.setTrimLeadingBlankRowAndColumn(true);
tsvSaveOptions.setKeepSeparatorsForBlankRow(false);

SpreadsheetSaveOptions xlsmSaveOptions = new SpreadsheetSaveOptions(SpreadsheetFormats.Xlsm);

String tsvSavePath = "C://output//spreadsheet.tsv";
String xlsmSavePath = "C://output//spreadsheet.xlsm";

editor.save(document, tsvSavePath, tsvSaveOptions);
editor.save(document, xlsmSavePath, xlsmSaveOptions);

In this example output “‘spreadsheet.tsv” will have a UTF-8 encoding.