簡體   English   中英

如何從csv文件中刪除重復的值,其中5列值4相同而一列值是diff

[英]How to remove dupliate values from csv file whe 5 columns values 4 are same and one column values is diff

我有一個以打擊格式顯示的csv文件,我想使用python腳本進行如下表所述的更改,因此您可以建議我使用相應的方法進行更改。

Sheet1 :(輸入文件)

Columns:   1     2     3     4    5
row1   :   abc   fff   v1    hhh  jjj
row2   :   abc   fff   v2    hhh  jjj
row3   :   efg   ooo   h1    ppp  www
row3   :   efg   ooo   h2    ppp  www

Sheet2 :(輸出文件)

Columns:    1     2      3      4    5
row1   :   abc   fff   v1|v2   hhh  jjj
row2   :   efg   ooo   h1|h2   ppp  www

能否請任何人幫助我做到這一點?

要閱讀csv並將其獲取到所需的位置,可以使用pandas

import pandas as pd

df = pd.read_csv('input_file_name.csv', header=None, sep='\s+')
#sep is the delimiter so change it if it is ',' for instance
#header is set to None as you seem not to have column names

df = df.groupby(['1', '2', '4', '5'])['3'].agg(lambda x: '|'.join(x)).reset_index()
df
#1     2     4    5   3
#abc   fff   hhh  jj  jv1|v2   
#efg   ooo   ppp  www h1|h2   

另外,您可以使用csv模塊,但是您會發現pandas使它變得更加簡單:

import csv

with open('myfile.csv') as infile, open('output.csv', 'wb') as outfile:
    value_place = 2
    result = {}
    for line in infile:
        line = line.strip().split(',')
        value = line[value_place]
        key = tuple(x for i, x in enumerate(line) if i != value_place)
        if key in result:
            result[key].append(value)
        else:
            result[key] = [value]
    desired = {k: '|'.join(v) for k, v in result.items()}
    writer = csv.writer(outfile)
    for k, v in desired.items():
        writer.writerow(list(k)+[v])

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM