简体   繁体   English

Bash:如何从CSV文件的列中抓取带有“定界符”的列?

[英]Bash: How to grab columns with “delimiter” within the column from CSV file?

I have a few CSV files I have downloaded from an online database. 我有一些从在线数据库下载的CSV文件。 I am trying to cut them down so that I can insert the portions of the data that I need into my SQL database. 我试图将它们削减,以便可以将所需的数据部分插入到SQL数据库中。

The CSV file has comma-separated fields and looks like this: CSV文件包含以逗号分隔的字段,如下所示:

1,Peptidoglycan synthetase ftsI,ftsI,1574687,L42023,P45059,FTSI_HAEIN,"",,,,
3,Histidine decarboxylase,HDC,32109,X54297,P19113,DCHS_HUMAN,,HDC,HDC,HGNC:4855,00817
5,"Glutaminase liver isoform, mitochondrial",GLS2,6650606,AF110330,Q9UI32,GLSL_HUMAN,,GLS2,GLS2,HGNC:29570,05901
6,Coagulation factor XIII A chain,F13A1,182309,M22001,P00488,F13A_HUMAN,1FIE,F13A1,F13A1,HGNC:3531,00604
7,"Nitric oxide synthase, inducible",NOS2,292242,L09210,P35228,NOS2_HUMAN,2NSI,NOS2A,NOS2A,HGNC:7873,01225

And here is the problem. 这就是问题所在。 Look at the 3rd and the 5th lines. 看第三行和第五行。 The 2nd column of those two lines have commas in them! 这两行的第二列中有逗号! I usually use awk for something like this, and because of that comma being there $2 gets messed up. 我通常将awk用于这样的事情,并且由于那里的逗号而使$ 2混乱。

So for example: 因此,例如:

awk -F ',' '{print $2}' myfile.csv ## Obviously I will be printing a lot more stuff

If that was done to that portion as shown above, the 3rd and 5th line will be screwed up because 2nd column of those two lines have a comma in them enclosed by quotation marks. 如果对上面的部分进行了上述操作,则第三行和第五行将被弄乱,因为这两行的第二列中都有一个用引号引起来的逗号。

What could I do to get around this? 我该怎么办才能解决这个问题?

EDIT: I'd like to still stick to doing this in shell if possible. 编辑:如果可能的话,我仍然想在shell中这样做。

You should use a CSV parser like 's Text::CSV (in a one-liner (so still in shell) if you want), it will do all the magic for you. 您应该使用像Text :: CSV这样的CSV解析器(如果需要,可以使用单行代码(因此仍然在shell中)),它将为您带来所有的魔力。

If instead you prefer , see the csv module 相反,如果您更喜欢 ,请参见csv模块

An example in + : + 的示例:

$ python<<EOF
import csv

f = open("test.csv", 'rt')
try:
    reader = csv.reader(f)
    for row in reader:
        print row
finally:
    f.close()
EOF

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM