简体   繁体   English

如何将同时具有逗号和空格分隔符的 CSV 文件转换为仅具有空格分隔符的 csv

[英]How to convert CSV file which having both comma and space delimiter to csv with only space delimiter

I am trying to convert the last column containing 2 values in the comma-separated form to 2 separate columns.我正在尝试将包含逗号分隔形式的 2 个值的最后一列转换为 2 个单独的列。 Please see the last columns of input and output file to understand the motive.请查看输入的最后一列和 output 文件以了解动机。

Below is how my input file looks like:下面是我的输入文件的样子:

fILENAME sent_no    word POS lab,Slab
File_1   sentence:1  abc NNP B,NO   
                     fhj PSP O,O    
                     bmm NNP B,NO   
                     vbn PSP O,O    
                     vbn NN  B,NO   
                     vbn NNPC B,NO  
                     .  Sym O,O 
File_1   Sentence:2 vbb NNP B,NO    
                    bbn PSP B,NO    
                    nnm NNP O,O 
                    nnn PSP B,NO    
                    bbn NN  O,O 
                    .   Sym O,O 

and output the output file I expect is as below:和 output 我期望的output 文件如下:

Filename sent_num word POS Label Slab
 File_1 sentence:1 abc NNP B     NO
                   fhj PSP O      O
                   bmm NNP B     NO
                   vbn PSP O      O
                   vbn NN B      NO
                   vbn NNPC B    NO
                   .   Sym O      O
 File_1 Sentence:2 vbb NNP B     NO
                   bbn PSP B     NO
                   nnm NNP O      O
                   nnn PSP B     NO
                   bbn NN  O      O
                   .   Sym O      O

try this:试试这个:

import pandas
df = pandas.read_csv('try.csv',sep=';')
df[['Label','Slabel']]=df['Label,Slabel'].str.split(',',expand=True)
df.drop(['Label,Slabel'],axis=1,inplace=True)
df.to_csv('try2.csv',sep=';')

but i see your data using multiindex dataframe, so I add this:但我看到你的数据使用 multiindex dataframe,所以我添加了这个:

df.set_index(['Filename','Sentence_num'],inplace=True)

and the result:结果:

>>> df
                       Word  POS Label Slabel
Filename Sentence_num                        
File_1   sentence:1     abc  NNP     B     NO
         sentence:1     fhj  PSP     O      O
         sentence:1     bmm  NNP     B     NO
         sentence:1     vbn  PSS     O      O
File_2   sentence:2     vbb  NNP     B     NO
         sentence:2     bbn  PSP     B     NO
         sentence:2     nnm  NNP     O      O
         sentence:2    nnnm  PSP     B     NO
>>> 

in simple way, you can just using multi separator like this:以简单的方式,您可以像这样使用多分隔符:

import pandas as pd
df = pandas.read_csv('try.csv',sep=' |,', engine='python') # separator space and comma

You can use pandas to separate the 'comma-separated' column into two columns.您可以使用 pandas 将“逗号分隔”列分隔为两列。

Here is an example dataframe这是一个例子 dataframe

import pandas as pd
df = pd.DataFrame([['a,b'], ['c,d']], columns=['Label,Slabel'])

It looks like this看起来像这样

    Label,Slabel
0   a,b
1   c,d

Then you can convert the values into a list and then into a Series.然后您可以将值转换为列表,然后转换为系列。

df['Label,Slabel'].str.split(',').apply(pd.Series)

The result结果

    0   1
0   a   b
1   c   d

I assume the *.csv file is我假设 *.csv 文件是

word POS lab,Slab
abc NNP B,NO
fhj PSP O,O
bmm NNP B,NO
vbn PSP O,O
vbn NN B,NO
vbn NNPC B,NO
vbb NNP B,NO
bbn PSP B,NO
nnm NNP O,O
nnn PSP B,NO
bbn NN O,O
. Sym O,O

You can use csv to read and write a specific delimiter csv file.可以使用csv来读写特定分隔符csv的文件。

import csv
with open(path, newline='') as csvf:
    rows = csv.reader(csvf, delimiter=' ')
    with open(new_path, 'w', newline='') as new_csvf:
        writer = csv.writer(new_csvf, delimiter=' ')
        for row in rows:
            slab = row[-1].split(',')[-1]
            row.append(slab)
            writer.writerow(row)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM