简体   繁体   English

MATLAB / Octave - 如何使用包含逗号的数字和字符串解析CSV文件

[英]MATLAB / Octave - how to parse CSV file with numbers and strings that contain commas

I have a CSV file that has 20 columns. 我有一个包含20列的CSV文件。 Some of the columns have number values, others have text values, and the text ones may or may not contain commas. 某些列具有数字值,其他列具有文本值,文本值可能包含也可能不包含逗号。

CSV content example: CSV内容示例:

column1, column2, column3, column4
"text value 1", 123, "text, with a comma", 25
"another, comma", 456, "other text", 78

I'm using textscan function, but I'm getting the most buggy and weird behavior. 我正在使用textscan功能,但我得到了最多的错误和奇怪的行为。 With some arguments, it reads all the values in only one column, sometimgs it repeats columns, and most of the things I've tried lead to the commas being incorrectly interpreted as column separators (despite text being enclosed in double quotes). 使用一些参数,它只读取一列中的所有值,一些列重复列,并且我尝试过的大多数事情导致逗号被错误地解释为列分隔符(尽管文本用双引号括起来)。 That is, I've tried specifying 'delimiter' argument, and also including literals in the format specification, to no avail. 也就是说,我已经尝试指定'delimiter'参数,并且还包括格式规范中的文字,但无济于事。

What's the correct way of invoking textscan to deal with a CSV file as the example above? 如上例所示,调用textscan以正确处理CSV文件的正确方法是什么? I'm looking for a solution that runs both on MATLAB and on Octave (or, if that's not possible, the equivalent solution in each one). 我正在寻找一种既可以在MATLAB上运行也可以在Octave上运行的解决方案(或者,如果不可能的话,每个都有相同的解决方案)。

For GNU Octave, using io package 对于GNU Octave,使用io包

pkg load io
c = csv2cell ("jota.csv")

gives

c = 
{
  [1,1] = column1
  [2,1] = text value 1
  [3,1] = another, comma
  [1,2] =  column2
  [2,2] =  123
  [3,2] =  456
  [1,3] =  column3
  [2,3] =  text, with a comma
  [3,3] =  other text
  [1,4] =  column4
  [2,4] =  25
  [3,4] =  78
}

btw, you should explicitly mention if the solution should run on GNU Octave, Matlab or both 顺便说一句,你应该明确提到解决方案是否应该在GNU Octave,Matlab或两者上运行

First, read the column headers using the format '%s' four times: 首先,使用格式'%s'读取列标题四次:

fileID = fopen(filename);
C_text = textscan(fileID,'%s', 4,'Delimiter',',');

Then use the conversion specifier, %q to read the text enclosed by double quotation marks ("): 然后使用转换说明符%q来读取用双引号(“)括起来的文本:

C = textscan(fileID,'%q %d %q %d','Delimiter',',');
fclose(fileID);

(This works for reading your sample data on Octave. It should work on MATLAB, too.) (这适用于在Octave上读取样本数据。它也适用于MATLAB。)

Edit: removed redundant fopen . 编辑:删除冗余fopen

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM