简体   繁体   English

创建具有特殊字符的CSV文件

[英]Creating CSV file with special characters in fields

I have looked for similar questions on stackoverflow but I haven't found any. 我已经在stackoverflow上寻找了类似的问题,但没有找到任何问题。 I want to export a table in CSV format so that it can be imported into Excel. 我想以CSV格式导出表格,以便可以将其导入Excel。 Each cell contains text and each row has the same number of columns. 每个单元格包含文本,每行具有相同数量的列。 The format I have tried is the following: 我尝试过的格式如下:

"d1"|"d2"|"d3"|"d4"

where d1, d2, d3, d4 are the original strings I want to put in each cell. 其中d1,d2,d3,d4是我要放入每个单元格中的原始字符串。 I have the following problems: 我有以下问题:

  1. | | can be contained in the data. 可以包含在数据中。 Is this really a problem? 这真的有问题吗? Maybe not because I have double-quotes around the strings. 可能不是因为我在字符串前后加上了双引号。 Maybe I could even use commas and it would not make a difference. 也许我什至可以使用逗号,并且不会有什么不同。
  2. " itself can be contained in the data. Should I escape it in some way? My current solution is to remove leading and trailing double-quotes from the original string before putting my double-quotes around it. It seems to work, but I think escaping the internal double-quotes would be cleaner. Do you know how to do this? “本身可以包含在数据中。我应该逃避它以某种方式?我目前的解决方法是删除领先,并把它周围我的双引号前,后从原始字符串双引号,这似乎工作,但我认为避免使用内部双引号会更干净。您知道该怎么做吗?
  3. The data can contain newline characters too. 数据也可以包含换行符。 I would like Excel to keep the data together in one cell, and to format the text within that cell according to the newlines. 我希望Excel将数据保存在一个单元格中,并根据换行符在该单元格中设置文本格式。 At the moment, this is not the case: Excel interprets newlines as terminating a record and adds extra lines in the imported table. 目前,情况并非如此:Excel将换行符解释为终止一条记录,并在导入的表中添加了额外的行。

Do you have any idea how to fix the above issues? 您是否知道如何解决上述问题? Is there some online documentation regarding these specific problems? 是否有一些有关这些特定问题的在线文档? I been searching since yesterday but did not find anything. 从昨天开始我一直在搜索,但没有找到任何东西。

Excel supports newlines in values. Excel支持值中的换行符。 For example, using the Excel user interface, you can get "foo\\n\\bar\\nbaz" into a cell by typing Alt-Enter for each line-break. 例如,使用Excel用户界面,可以通过为每个换行符键入Alt-Enter来将"foo\\n\\bar\\nbaz"放入单元格中。

The tricky thing about Excel is that in locales where the comma is used as a decimal point, Excel uses a colon as the field delimiter. 关于Excel的棘手问题是,在使用逗号作为小数点的语言环境中,Excel使用冒号作为字段定界符。 There is no universal/international format that any Excel will read. 没有任何Excel可以读取的通用/国际格式。

I'd be very surprised if there wasn't a package in Java for reading/writing CSV files. 如果Java中没有用于读取/写入CSV文件的软件包,我会感到非常惊讶。 Python has one that allows you to specify the delimiter, quote char, record separator, etc on both input and output. Python有一个允许您在输入和输出上指定定界符,双引号char,记录分隔符等的代码。

However if you want to write your own, follow this pseudocode for each row that you want to write: 但是,如果要编写自己的代码,请对要编写的每一行遵循以下伪代码:

for each field in the row:
    if field contains quotechar:
        double all quotechars in field
        field = quotechar + field + quotechar
    else if field contains delimiter, CR, or LF:
        field = quotechar + field + quotechar
    else:
        avoid waste of space and ugly visual impact by NOT doing unneeded quoting
join field strings separated by delimiter
append CR LF 
write the row string using binary mode (so Windows runtime doesn't give you 2xCR)

Note carefully (1) all of the above is premised on 8-bit characters (2) I have avoided using the ambigous term "newline". 请仔细注意(1)以上所有内容均以8位字符为前提(2)我避免使用模糊的术语“换行符”。

I find CSV is best done with comma separation, and quoting values so commas in values aren't misinterpreted. 我发现最好用逗号分隔来完成CSV,并引用值,这样就不会误解值中的逗号。 Quoting quotes is done with double quoting. 用双引号来引用报价。 So the following four values: 因此,以下四个值:

one
two
three with "quoted" value
four

becomes: 变成:

one,two,"three with ""quoted"" value",four

I don't believe it's possible for standard CSV implementations to support newlines in values; 我认为标准CSV实现不可能支持值中的换行符; particularly not in Excel. 特别是不在Excel中。 Try creating a cell in Excel with newlines (is that even possible?) and saving as CSV to see if that works. 尝试在Excel中使用换行符创建单元格(甚至可以吗?)并另存为CSV,以查看是否可行。

http://en.wikipedia.org/wiki/Comma-separated_values http://en.wikipedia.org/wiki/Comma-separated_values

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM