简体   繁体   English


[英]Convert csv file to txt file

I'm using perl to convert a comma separated file to a tab separated file with this command: 我正在使用perl通过以下命令将逗号分隔的文件转换为制表符分隔的文件:

perl -e ' $sep=","; while(<>) { s/\Q$sep\E/\t/g; print $_; } warn "Changed $sep to tab on $. lines\n" ' csvfile.csv > tabfile.tab

However, my file has additional commas that I do not want to be separated in specific columns. 但是,我的文件还有其他逗号,我不想在特定的列中分开。 Here's and example of my file: 这是我的文件的示例:

ADNP, "descript1, descript2", 1
PTB, "descriptA, descriptB", 5

I only want to convert the comma's outside of the quotations to tabs as so: 我只想将引号外的逗号转换为制表符:

ADNP    descript1, descript2    1
PTB    descriptA, descriptB    5

Is there anyway to go about doing this with either perl, python, or bash? 无论如何,可以使用perl,python或bash进行此操作吗?

Trivial in Perl, using Text::CSV : 在Perl中使用Text::CSV琐碎:

#!/usr/bin/env perl
use strict;
use warnings;
use Text::CSV;

#configure our read format using the default separator of ","
my $input_csv = Text::CSV->new( { binary => 1 } );
#configure our output format with a tab as separator. 
my $output_csv = Text::CSV->new( { binary => 1, sep_char => "\t", eol => "\n" } );

#open input file
open my $input_fh, '<', "sample.csv" or die $!;
#iterate input file - reading in 'comma separated' 
#printing out (to stdout -can use filehandle) tab separated. 
while ( my $row = $input_csv->getline($input_fh) ) {
    $output_csv->print( \*STDOUT, $row );

In python 在python中

import csv

with open('input', 'rb') as inf:
    reader = csv.reader(inf)
    with open('output', 'wb') as out:
        writer = csv.writer(out, delimiter='\t')

You need regular expressions to help you. 您需要正则表达式来帮助您。 In python it would simply be: 在python中,它就是:

>>> re.split(r'(?!\B"[^"]*),(?![^"]*"\B)',  'ADNP, "descript1, descript2", 1'
['ADNP', ' "descript1, descript2"', ' 1']

建立rll的regex答案,就可以像现在一样将其变成perl oneliner

perl -ne 'BEGIN{$,="\t";}@a=split(/(?!\B"[^"]*),(?![^"]*"\B)/);print @a' csvfile.csv > tabfile.tab

This'll work: 这将起作用:

perl -e '$sep=","; while(<STDIN>) { @data = split(/(\Q$sep\E?\s*"[^"]+"\s*\Q$sep\E?)/); foreach(@data){if(/"/){s/^\Q$sep\E\s*"//;s/"\s*\Q$sep\E$//;}else{s/\Q$sep\E/\t/g;}}print(join("\t",@data));} warn "Changed $sep to tab on $. lines\n"' < csvfile.csv > tabfile.tab

Putting parens in the pattern to split, returns the captured separators along with the split elements and effectively separates the strings containing quotes into separate list elements that can be treated differently when quotes are detected. 将括号放入模式中进行拆分,将捕获的分隔符与拆分元素一起返回,并有效地将包含引号的字符串分隔为单独的列表元素,在检测到引号时可以将其区别对待。 You just strip off the commas and quotes for the quoted strings and substitute for tabs in the other elements, then join the elements with tabs (so that the quoted strings get joined with tabs to the other already tabbed strings. 您只需去除引号中的字符串的逗号和引号,并替换其他元素中的制表符,然后将这些元素与制表符连接(这样,带引号的字符串与制表符就可以与其他已经制表符的字符串连接起来。

The Text::CSV module is what you're looking for. 您正在寻找Text :: CSV模块。 There are a lot of considerations when parsing CSV files, and you really don't want to handle all of them yourself. 解析CSV文件时有很多注意事项,您真的不想自己处理所有这些文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM