简体   繁体   中英

Creating CSV file with special characters in fields

I have looked for similar questions on stackoverflow but I haven't found any. I want to export a table in CSV format so that it can be imported into Excel. Each cell contains text and each row has the same number of columns. The format I have tried is the following:

"d1"|"d2"|"d3"|"d4"

where d1, d2, d3, d4 are the original strings I want to put in each cell. I have the following problems:

  1. | can be contained in the data. Is this really a problem? Maybe not because I have double-quotes around the strings. Maybe I could even use commas and it would not make a difference.
  2. " itself can be contained in the data. Should I escape it in some way? My current solution is to remove leading and trailing double-quotes from the original string before putting my double-quotes around it. It seems to work, but I think escaping the internal double-quotes would be cleaner. Do you know how to do this?
  3. The data can contain newline characters too. I would like Excel to keep the data together in one cell, and to format the text within that cell according to the newlines. At the moment, this is not the case: Excel interprets newlines as terminating a record and adds extra lines in the imported table.

Do you have any idea how to fix the above issues? Is there some online documentation regarding these specific problems? I been searching since yesterday but did not find anything.

Excel supports newlines in values. For example, using the Excel user interface, you can get "foo\\n\\bar\\nbaz" into a cell by typing Alt-Enter for each line-break.

The tricky thing about Excel is that in locales where the comma is used as a decimal point, Excel uses a colon as the field delimiter. There is no universal/international format that any Excel will read.

I'd be very surprised if there wasn't a package in Java for reading/writing CSV files. Python has one that allows you to specify the delimiter, quote char, record separator, etc on both input and output.

However if you want to write your own, follow this pseudocode for each row that you want to write:

for each field in the row:
    if field contains quotechar:
        double all quotechars in field
        field = quotechar + field + quotechar
    else if field contains delimiter, CR, or LF:
        field = quotechar + field + quotechar
    else:
        avoid waste of space and ugly visual impact by NOT doing unneeded quoting
join field strings separated by delimiter
append CR LF 
write the row string using binary mode (so Windows runtime doesn't give you 2xCR)

Note carefully (1) all of the above is premised on 8-bit characters (2) I have avoided using the ambigous term "newline".

I find CSV is best done with comma separation, and quoting values so commas in values aren't misinterpreted. Quoting quotes is done with double quoting. So the following four values:

one
two
three with "quoted" value
four

becomes:

one,two,"three with ""quoted"" value",four

I don't believe it's possible for standard CSV implementations to support newlines in values; particularly not in Excel. Try creating a cell in Excel with newlines (is that even possible?) and saving as CSV to see if that works.

http://en.wikipedia.org/wiki/Comma-separated_values

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM