基于第一列合并行的 Python 脚本

Question

I have seen lots of question/answers on this but none that I have looked at have solved my problem, so any help would be appreciated.我已经看到很多关于此的问题/答案，但我看过的都没有解决我的问题，所以任何帮助将不胜感激。

I have a very large CSV file that has some duplicated column entries but I would like a script to match and merge the rows based on the 1st column.我有一个非常大的 CSV 文件，其中包含一些重复的列条目，但我想要一个脚本来匹配和合并基于第一列的行。 (I do not want to use pandas. I am using Python 2.7. There is no CSV headers in the file) （我不想使用熊猫。我使用的是 Python 2.7。文件中没有 CSV 标头）

This is the input:这是输入：

2144, 2016, 505, 20005, 2007, PP, GPP, DAC, UNSW 
8432, 2015, 505, 20005, 2041, LL, GLO, X2, UNSW
0055, 0.00, 0.00, 2014, 2017
2144, 0.00, 0.00, 2016, 959
8432, 22.9, 0.00, 2015, 2018 
0055, 2014, 505, 20004, 2037, LL, GLO, X2, QAL

Wanted output:想要的输出：

2144, 0.00, 0.00, 2016, 959, 2016, 505, 20005, 2007, PP, GPP, DAC, UNSW  
0055, 0.00, 0.00, 2014, 2017, 2014, 505, 20004, 2037, LL, GLO, X2, QAL   
8432, 22.9, 0.00, 2015, 2018, 2015, 505, 20005, 2041, LL, GLO, X2, UNSW

I have tried :我试过了：

reader = csv.reader(open('input.csv))
result = {}

for row in reader:
    idx = row[0]
    values = row[1:]
    if idx in result:
        result[idx] = [result[idx][i] or v for i, v in enumerate(values)]
    else:
        result[idx] = values

and this to search duplicates:这是搜索重复项：

with open('1.csv','r') as in_file, open('2.csv','w') as out_file:
    seen = set() # set for fast O(1) amortized lookup
    for line in in_file:
        if line in seen: continue

But these haven't helped me- I'm lost但这些都没有帮助我 - 我迷路了

Any help would be great.任何帮助都会很棒。

Thanks谢谢

Answer 1

Try using a dictionary, with the value of the 1st column as your key.尝试使用字典，以第一列的值作为键。 Here's how I would do it :这是我将如何做到的：

with open('myfile.csv') as csvfile:
    reader = list(csv.reader(csvfile, skipinitialspace=True))  # remove the spaces after the commas
    result = {}  # or collections.OrderedDict() if the output order is important
    for row in reader:
        if row[0] in result:
            result[row[0]].extend(row[1:])  # do not include the key again
        else:
            result[row[0]] = row

    # result.values() returns your wanted output, for example :
    for row in result.values():
        print(', '.join(row))

基于第一列合并行的 Python 脚本

问题描述

1 个解决方案

解决方案1
1 2018-02-13 23:35:38

基于第一列合并行的 Python 脚本

问题描述

1 个解决方案

解决方案1 1 2018-02-13 23:35:38

解决方案1
1 2018-02-13 23:35:38