简体   繁体   English

将Python pandas数据帧行切片写入文件

[英]Writing Python pandas dataframe row slices to a file

I've got a CSV file with 4 columns, first column being case id (which is repetitive). 我有一个包含4列的CSV文件,第一列是case id (重复)。

========INPUT csv file=============
case_num, serial,binary,review
23,29983, 1, "lorem ipsum ,lorem ipsum"
23,298829, 1, "Hi there"
29, 20020, 0, "hickery dickery dock"
29,298829, 1, "Hello there"
29, 28220, 0, "dickery dock"

I'm trying to filter all rows based on unique number of case ids only. 我试图仅根据案例ID的唯一数量过滤所有行。

input=pandas.read_csv("inp.csv")
case_id=fl["case_num"]
case_id.sort
with open("out.csv","w") as fl:    
    for i in case_id.unique():
        fl.write(([input['case_num']==i].iloc[0].values)) 

Output: 输出:

[23 '29983' 1
 'lorem ipsum ,lorem ipsum'] #<type 'numpy.ndarray'>

[29 '20220' 0
 'hickery dickery dock']     #<type 'numpy.ndarray'>

As you can see the output is being written out in different lines, but I want them properly as one row each line split by comma. 正如您所看到的那样,输出是以不同的行写出来的,但我希望它们正确地作为一行,每行用逗号分隔。

=====DESIRED OUTPUT======= =====所需的输出=======

23, '29983', 1,  'lorem ipsum ,lorem ipsum'
29 ,'20220', 0,  'hickery dickery dock'

To put it simply, if I've read some rows from a dataframe (generated using a csv file), then how do I write the selected subset of rows exactly in the same format (as was the input csv file) to an output csv file. 简单地说,如果我从数据框中读取了一些行(使用csv文件生成),那么如何将所选行的所选子集以相同的格式(与输入csv文件一样)写入输出csv文件。

IIUC you can use drop_duplicates : IIUC你可以使用drop_duplicates

print df
   case id case_num no                        text
0       23  '29983'  1  'lorem ipsum ,lorem ipsum'
1       23  '29983'  1  'lorem ipsum ,lorem ipsum'
2       23  '29983'  1  'lorem ipsum ,lorem ipsum'
3       23  '29983'  1  'lorem ipsum ,lorem ipsum'
4       29  '20220'  0      'hickery dickery dock'

df = df.drop_duplicates(subset='case id')
print df
   case id case_num no                        text
0       23  '29983'  1  'lorem ipsum ,lorem ipsum'
4       29  '20220'  0      'hickery dickery dock'

Output to csv by to_csv : 通过to_csv输出到csv:

df.to_csv(filename, sep=',', index=False)
case id,case_num,no,text
23,'29983',1,"'lorem ipsum ,lorem ipsum'"
29,'20220',0,'hickery dickery dock'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM