繁体   English   中英

如何根据第一列合并两个 csv 文件(无标题,无 PANDAS)

[英]How to merge two csv files based on first column (NO HEADERS, NO PANDAS)

我有两个 csv 文件需要根据第一列(即第 0 列)进行合并。 我不能给他们标题,我不能使用 pandas。这是两个文件

学生专业List.csv

305671,Jones,Bob,Electrical Engineering,
987621,Wong,Chen,Computer Science,
323232,Rubio,Marco,Computer Information Systems,
564321,Awful,Student,Computer Science,Y
769889,Boy,Sili,Computer Information Systems,Y
156421,McGill,Tom,Electrical Engineering,
999999,Genius,Real,Physics,

GPAList.csv

156421,3.4
305671,3.1
323232,3.8
564321,2.2
769889,3.9
987621,3.85
999999,4

我希望生成的 csv 文件看起来像这样

FullRoster.csv

305671,Jones,Bob,Electrical Engineering,3.1
987621,Wong,Chen,Computer Science,3.85
323232,Rubio,Marco,Computer Information Systems,3.8
564321,Awful,Student,Computer Science,Y,2.2
769889,Boy,Sili,Computer Information Systems,Y,3.9
156421,McGill,Tom,Electrical Engineering,3.4
999999,Genius,Real,Physics,4

我可以使用什么代码来实现这一目标。 请记住 pandas 是不允许的,我不能给文件头来让事情变得更容易。 我必须完全按原样使用它们。 谢谢!

编辑:我道歉我没有包括我的代码。 我不经常使用这个引用,并且在发布之前我没有熟悉规则。 抱歉,这是我目前所拥有的:但是这段代码不起作用:

with open('StudentsMajorsList.csv','r') as f2:
    reader = csv.reader(f2)
    dict2 = {row[0]: row[1:] for row in reader}

with open('GPAList.csv','r') as f1:
    reader = csv.reader(f1)
    dict1 = OrderedDict((row[0], row[1:]) for row in reader)

result = OrderedDict()
for d in (dict1, dict2):
    for key, value in dict.items():
        result.setdefault(key, []).extend(value)

with open('FullRoster.csv', 'w') as f:
    w = csv.writer(f)
    for key, value in result.items():
        w.writerow([key] + value)
# usage: merge_csv.py <file1> <file2> <output>
# example: merge_csv.py file1.csv file2.csv file3.csv
import csv
import sys


def merge_csv(file1, file2, output):
    with open(file1, 'r') as f1, open(file2, 'r') as f2, open(output, 'w') as f3:
        reader1 = csv.reader(f1)
        reader2 = csv.reader(f2)
        writer = csv.writer(f3)
        for row1, row2 in zip(reader1, reader2):
            if row1[0] == row2[0]:
                writer.writerow(row1 + row2[1:])
            elif row1[0] < row2[0]:
                writer.writerow(row1)
            else:
                writer.writerow(row2)


if __name__ == '__main__':
    merge_csv(sys.argv[1], sys.argv[2], sys.argv[3])
    print('done')

这就是我要做的:

import csv

with open('StudentsMajorsList.csv', newline='') as file:
    reader = csv.reader(file)
    data1 = list(reader)
    
with open('GPAList.csv', newline='') as file:
    reader = csv.reader(file)
    data2 = list(reader)

merge1 = []
merge2 = []
merge3 = []

for list1 in data1:
    for item in list1:
        x = item.split(',')
    merge1.append(x)
        
for list2 in data2:
    for item in list2:
        x = item.split(',')
    merge2.append(x)
    
for i in range(len(merge1)):
    for j in range(len(merge2)):
        if(merge1[i][0] == merge2[j][0]):
            merge3.append(merge1[i][0:])
            merge3[i].append(merge2[j][1])

for item in merge3:
    for i in item:
        if (i == ''):
            item.remove(i)
            
for item in range(len(merge3)):
    print(merge3[item])
    
with open('FullRoster.csv', 'w') as csvfile:
    csvwriter = csv.writer(csvfile)
    csvwriter.writerows(merge3)

Output:

['305671', 'Jones', 'Bob', 'Electrical Engineering', '3.1']
['987621', 'Wong', 'Chen', 'Computer Science', '3.85']
['323232', 'Rubio', 'Marco', 'Computer Information Systems', '3.8']
['564321', 'Awful', 'Student', 'Computer Science', 'Y', '2.2']
['769889', 'Boy', 'Sili', 'Computer Information Systems', 'Y', '3.9']
['156421', 'McGill', 'Tom', 'Electrical Engineering', '3.4']
['999999', 'Genius', 'Real', 'Physics', '4']

您需要了解如何使用 Python 内置的csv库,该库有助于读取一行 CSV 值并将其转换为列表。

解决这个问题的方法是首先将GPAList值读入字典。 这允许轻松查找任何 ID 值。

对于学生 CSV 中的每一行,在字典中查找所需的值并将其 append 写入刚刚读入的行,同时将其写入 output CSV 文件。

例如:

import csv

with open('GPAList.csv') as f_gpa:
    csv_gpa = csv.reader(f_gpa)
    gpa = dict(csv_gpa)
    
with open('StudentsMajorsList.csv') as f_students, open('FullRoster.csv', 'w', newline='') as f_roster:
    csv_students = csv.reader(f_students)
    csv_roster = csv.writer(f_roster)
    
    for row in csv_students:
        csv_roster.writerow([*row, gpa[row[0]]])

我建议您添加一些打印语句以更好地理解其工作原理。 例如print(gpa)

我会分 4 个步骤解决这个问题

  1. 阅读 StudentsMajorsList.csv -> 数据( {row[0]: row}
  2. 删除行中那些空的最后一列
  3. 读取 GPAList.csv 并更新数据
  4. 写入 FullRoster.csv
import csv
import pprint

# Step 1: Read StudentsMajorsList.csv into data
with open("StudentsMajorsList.csv") as stream:
    reader = csv.reader(stream)
    data = {row[0]: row for row in reader}

# Step 2: Remove those empty last columns
for row in data.values():
    if row[-1] == "":
        del row[-1]

# Step 3: read GPAList.csv and update data
with open("GPAList.csv") as stream:
    reader = csv.reader(stream)
    for student_id, gpa in reader:
        if student_id in data:
            data[student_id].append(gpa)

# Step 4: Write to FullRoster.csv
with open("FullRoster.csv", "w") as stream:
    writer = csv.writer(stream)
    writer.writerows(data.values())

注意:第 2 步是为了使 output 与您预期的 output 匹配,但这会导致数据不一致,这意味着某些行将有 5,而其他行将有 6 列。 因此,如果您想要一致的数据,请删除步骤 2。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM