简体   繁体   English

基于第一列合并行的 Python 脚本

[英]Python script to merge rows based on 1st column

I have seen lots of question/answers on this but none that I have looked at have solved my problem, so any help would be appreciated.我已经看到很多关于此的问题/答案,但我看过的都没有解决我的问题,所以任何帮助将不胜感激。

I have a very large CSV file that has some duplicated column entries but I would like a script to match and merge the rows based on the 1st column.我有一个非常大的 CSV 文件,其中包含一些重复的列条目,但我想要一个脚本来匹配和合并基于第一列的行。 (I do not want to use pandas. I am using Python 2.7. There is no CSV headers in the file) (我不想使用熊猫。我使用的是 Python 2.7。文件中没有 CSV 标头)

This is the input:这是输入:

2144, 2016, 505, 20005, 2007, PP, GPP, DAC, UNSW 
8432, 2015, 505, 20005, 2041, LL, GLO, X2, UNSW
0055, 0.00, 0.00, 2014, 2017
2144, 0.00, 0.00, 2016, 959
8432, 22.9, 0.00, 2015, 2018 
0055, 2014, 505, 20004, 2037, LL, GLO, X2, QAL

Wanted output:想要的输出:

2144, 0.00, 0.00, 2016, 959, 2016, 505, 20005, 2007, PP, GPP, DAC, UNSW  
0055, 0.00, 0.00, 2014, 2017, 2014, 505, 20004, 2037, LL, GLO, X2, QAL   
8432, 22.9, 0.00, 2015, 2018, 2015, 505, 20005, 2041, LL, GLO, X2, UNSW

I have tried :我试过了 :

reader = csv.reader(open('input.csv))
result = {}

for row in reader:
    idx = row[0]
    values = row[1:]
    if idx in result:
        result[idx] = [result[idx][i] or v for i, v in enumerate(values)]
    else:
        result[idx] = values

and this to search duplicates:这是搜索重复项:

with open('1.csv','r') as in_file, open('2.csv','w') as out_file:
    seen = set() # set for fast O(1) amortized lookup
    for line in in_file:
        if line in seen: continue

But these haven't helped me- I'm lost但这些都没有帮助我 - 我迷路了

Any help would be great.任何帮助都会很棒。

Thanks谢谢

Try using a dictionary, with the value of the 1st column as your key.尝试使用字典,以第一列的值作为键。 Here's how I would do it :这是我将如何做到的:

with open('myfile.csv') as csvfile:
    reader = list(csv.reader(csvfile, skipinitialspace=True))  # remove the spaces after the commas
    result = {}  # or collections.OrderedDict() if the output order is important
    for row in reader:
        if row[0] in result:
            result[row[0]].extend(row[1:])  # do not include the key again
        else:
            result[row[0]] = row

    # result.values() returns your wanted output, for example :
    for row in result.values():
        print(', '.join(row))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在python(panda)中检查一列的第一行与anothter的所有行 - check 1st row of a column with all rows of anothter in python (panda) Python:根据第一列中的值提取excel单元格值 - Python: Extracting excel cell values based on value in 1st column 如何根据第一列,第二列等对Theano中的行进行排序 - How to sort rows in Theano based on 1st column, then 2nd column, etc 连接Python中具有相同第一列值的CSV文件的所有行 - Joining all rows of a CSV file that have the same 1st column value in Python 根据第二列中存在的字符串更新第一列 - Update 1st column based on string present in 2nd column Python根据列名和第一列值重塑 - Python reshaping according to column names and 1st column value Python Dataframe: To get a column value from 2nd dataframe based on a column in the 1st dataframe is in between two columns in the 2nd dataframe - Python Dataframe: To get a column value from 2nd dataframe based on a column in the 1st dataframe is in between two columns in the 2nd dataframe Python:将变量从第一个脚本传递到第二个脚本,并将不同的变量从第二个脚本传递到第一个脚本 - Python: Passing variable from 1st script to 2nd script and passing different variable from 2nd script to 1st script Python 3: Numpy 3d array to Pandas dataframe with 1st dimension values as columns and rows/cols position paired in one column - Python 3: Numpy 3d array to Pandas dataframe with 1st dimension values as columns and rows/cols position paired in one column 从单个 Pandas 列中取出第一和第二、第四和第五等行并放入两个新列 Python - Taking the 1st and 2nd, 4th and 5th etc rows from a single Pandas column and put in two new columns, Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM