About Comma Separated Files

Appendix B: About Comma Separated Files

Appendix B: About Comma Separated Files

The term "comma separated format" (or "tab separated format" or “CSV”) is often used as a catchall term for all kinds of text-based data formats, where the data is formatted in a line-by-line fashion. Each line contains one data record, and a number of columns per line, where the different columns are separated by comma (or tab, or some other separator character).

LISTSERV Maestro can correctly interpret comma separated text files in various formats as long as the following rules are applied:

•

Any character may be used as the separator character, although a comma, tab, or semicolon is conventional.

•

The same separator character must be used in all lines for the entire file.

•

All lines in the file must have the same number of columns, which means the same number of separator characters.

•

Empty columns may be created in order that the same number of separator characters is present in every line of the file.

•

Having two separator characters in direct succession, without any characters in between, creates an empty column.

•

If a line begins with the separator character, then LISTSERV Maestro assumes the line begins with an empty column.

•

If a line ends with the separator character, then LISTSERV Maestro assumes the line ends with an empty column.

•

If the character that is used as the separator character also appears as part of the value of one or several of the column fields, then it is necessary to enclose the fields in quotation marks or another “quote character.”

•

Any character can be used as the quote character (quotation marks, or apostrophe are conventional), except for the separator character.

•

The same character must be used for the opening quote and for the closing quote.

•

If quotes are used in some records in a file (especially records that appear near the end of the file), it is important to manually define the separator and quote character instead of allowing LISTSERV Maestro to attempt to parse the file automatically. By manually defining the separator and quote characters, LISTSERV Maestro is forced to look at the entire file and parse it according to the values entered for these characters. If LISTSERV Maestro attempts to parse the file automatically when it contains quote characters in some lines, but not all, then those records may be parsed incorrectly or may be rejected as invalid.

•

If there is a need to include the quote character inside of the value of a field, then this character must be escaped. Escape the quote character by using it twice, in direct succession. The double appearance of the quote character will be interpreted as a single appearance that is part of the field value. Follow these basic rules for separator and quote characters.

•

If the first character in the field is the quote character, then LISTSERV Maestro assumes the field is quoted and the next not-escaped quote character marks the end of the field. The end of the field must then be followed by a separator character or by the end of a line – trailing white space after the last field of the line is allowed.

•

If the first character in a field is not the quote character, then LISTSERV Maestro assumes the field is not quoted, and the next appearance of the separator character marks the end of the field.

Here are some examples:

Simple values, separated by comma, not quoted:

John,Doe,Denver,USA
Lucy,Summers,London,UK
Karl,Hauser,Frankfurt,DE

This defines a dataset with three rows, each row consisting of four fields.

Simple values, separated by comma, not quoted, with empty fields:

John,,Denver,USA
,Summers,London,UK
Karl,Hauser,Frankfurt,

This defines a dataset with three rows, each row consisting of four fields. In the first row, the second field is empty, in the second row the first field is empty and in the last row, the fourth field is empty.

Values in which some contain a comma, separated by comma, quoted with <">:

John,Doe,"Denver, Colorado",USA
Lucy,Summers,London,UK
Karl,Hauser,"Frankfurt, am Main",DE

This defines a dataset with three rows, each row consisting of four fields. The third fields in the first and last rows each have a value that contains a comma. Since this comma is inside of the quote characters, it is not interpreted as a separator comma, but instead as part of the value of the field.

Values in which some contain a comma, separated by comma, quoted with <">, with empty fields:

John,,"Denver, Colorado",USA
,Summers,London,UK
Karl,Hauser,"Frankfurt, am Main",

This defines a dataset with three rows, each row consisting of four fields. The third fields in the first and last rows each have a value that contains a comma. Since this comma is inside of the quote characters, it is not interpreted as a separator comma, but instead as part of the value of the field. Also, in each row there is an empty field.

Values in which some contain a comma and some the quote character, separated by comma, quoted with <">:

John ""Hammer"" Cool,Doe,"Denver, Colorado",USA
Lucy,Summers,London,UK
Karl,Hauser ""the man""","Frankfurt, am Main",DE

This defines a dataset with three rows, each row consisting of four fields. The third fields in the first and last rows each have a value that contains a comma. Since this comma is inside of the quote characters, it is not interpreted as a separator comma, but instead as part of the value of the field. In addition, the first field in the first row contains the quote character, which has been escaped. Including the quotes, the field in its escaped form looks like this: John ""Hammer"" Cool. The two double appearances of the quote character around the word "Hammer" are not interpreted as quotes that delimit the field, but are instead interpreted as single appearances of the quote character which are part of the field value. Therefore, the un-escaped form of the field looks like this: John "Hammer" Cool. Similarly, the second field of the last row has the un-escaped form of Hauser "the man”.