简体   繁体   English

如何根据第一列合并两个 csv 文件(无标题,无 PANDAS)

[英]How to merge two csv files based on first column (NO HEADERS, NO PANDAS)

I have two csv files that I need to merge based on the first column (which is column 0).我有两个 csv 文件需要根据第一列(即第 0 列)进行合并。 I cannot give them headers, and I cannot use pandas. Here are the two files我不能给他们标题,我不能使用 pandas。这是两个文件

StudentsMajorsList.csv学生专业List.csv

305671,Jones,Bob,Electrical Engineering,
987621,Wong,Chen,Computer Science,
323232,Rubio,Marco,Computer Information Systems,
564321,Awful,Student,Computer Science,Y
769889,Boy,Sili,Computer Information Systems,Y
156421,McGill,Tom,Electrical Engineering,
999999,Genius,Real,Physics,

GPAList.csv GPAList.csv

156421,3.4
305671,3.1
323232,3.8
564321,2.2
769889,3.9
987621,3.85
999999,4

I want the resulting csv file to appear like this我希望生成的 csv 文件看起来像这样

FullRoster.csv FullRoster.csv

305671,Jones,Bob,Electrical Engineering,3.1
987621,Wong,Chen,Computer Science,3.85
323232,Rubio,Marco,Computer Information Systems,3.8
564321,Awful,Student,Computer Science,Y,2.2
769889,Boy,Sili,Computer Information Systems,Y,3.9
156421,McGill,Tom,Electrical Engineering,3.4
999999,Genius,Real,Physics,4

What code can I use in order to achieve this.我可以使用什么代码来实现这一目标。 Please remember that pandas is not allowed, and I cannot give the files headers to make things easier.请记住 pandas 是不允许的,我不能给文件头来让事情变得更容易。 I have to use them exactly as they are.我必须完全按原样使用它们。 Thank you!谢谢!

EDIT: I APOLOGIZE I did not include the code I have.编辑:我道歉我没有包括我的代码。 I don't use this cite often and I didn't familiarize myself with the rules before posting.我不经常使用这个引用,并且在发布之前我没有熟悉规则。 My apologies, Here's what I have so far: but this code does not work:抱歉,这是我目前所拥有的:但是这段代码不起作用:

with open('StudentsMajorsList.csv','r') as f2:
    reader = csv.reader(f2)
    dict2 = {row[0]: row[1:] for row in reader}

with open('GPAList.csv','r') as f1:
    reader = csv.reader(f1)
    dict1 = OrderedDict((row[0], row[1:]) for row in reader)

result = OrderedDict()
for d in (dict1, dict2):
    for key, value in dict.items():
        result.setdefault(key, []).extend(value)

with open('FullRoster.csv', 'w') as f:
    w = csv.writer(f)
    for key, value in result.items():
        w.writerow([key] + value)
# usage: merge_csv.py <file1> <file2> <output>
# example: merge_csv.py file1.csv file2.csv file3.csv
import csv
import sys


def merge_csv(file1, file2, output):
    with open(file1, 'r') as f1, open(file2, 'r') as f2, open(output, 'w') as f3:
        reader1 = csv.reader(f1)
        reader2 = csv.reader(f2)
        writer = csv.writer(f3)
        for row1, row2 in zip(reader1, reader2):
            if row1[0] == row2[0]:
                writer.writerow(row1 + row2[1:])
            elif row1[0] < row2[0]:
                writer.writerow(row1)
            else:
                writer.writerow(row2)


if __name__ == '__main__':
    merge_csv(sys.argv[1], sys.argv[2], sys.argv[3])
    print('done')

This is how I would of done it:这就是我要做的:

import csv

with open('StudentsMajorsList.csv', newline='') as file:
    reader = csv.reader(file)
    data1 = list(reader)
    
with open('GPAList.csv', newline='') as file:
    reader = csv.reader(file)
    data2 = list(reader)

merge1 = []
merge2 = []
merge3 = []

for list1 in data1:
    for item in list1:
        x = item.split(',')
    merge1.append(x)
        
for list2 in data2:
    for item in list2:
        x = item.split(',')
    merge2.append(x)
    
for i in range(len(merge1)):
    for j in range(len(merge2)):
        if(merge1[i][0] == merge2[j][0]):
            merge3.append(merge1[i][0:])
            merge3[i].append(merge2[j][1])

for item in merge3:
    for i in item:
        if (i == ''):
            item.remove(i)
            
for item in range(len(merge3)):
    print(merge3[item])
    
with open('FullRoster.csv', 'w') as csvfile:
    csvwriter = csv.writer(csvfile)
    csvwriter.writerows(merge3)

Output: Output:

['305671', 'Jones', 'Bob', 'Electrical Engineering', '3.1']
['987621', 'Wong', 'Chen', 'Computer Science', '3.85']
['323232', 'Rubio', 'Marco', 'Computer Information Systems', '3.8']
['564321', 'Awful', 'Student', 'Computer Science', 'Y', '2.2']
['769889', 'Boy', 'Sili', 'Computer Information Systems', 'Y', '3.9']
['156421', 'McGill', 'Tom', 'Electrical Engineering', '3.4']
['999999', 'Genius', 'Real', 'Physics', '4']

You need to learn about using Python's built in csv library which helps with reading a line of CSV values and converting it into a list.您需要了解如何使用 Python 内置的csv库,该库有助于读取一行 CSV 值并将其转换为列表。

The approach to this problem is to first read the GPAList values into a dictionary.解决这个问题的方法是首先将GPAList值读入字典。 This allows any ID value to be looked up easily.这允许轻松查找任何 ID 值。

The for each row in the student CSV, lookup the required value in the dictionary and append it to the row just read in whilst writing it to the output CSV file.对于学生 CSV 中的每一行,在字典中查找所需的值并将其 append 写入刚刚读入的行,同时将其写入 output CSV 文件。

For example:例如:

import csv

with open('GPAList.csv') as f_gpa:
    csv_gpa = csv.reader(f_gpa)
    gpa = dict(csv_gpa)
    
with open('StudentsMajorsList.csv') as f_students, open('FullRoster.csv', 'w', newline='') as f_roster:
    csv_students = csv.reader(f_students)
    csv_roster = csv.writer(f_roster)
    
    for row in csv_students:
        csv_roster.writerow([*row, gpa[row[0]]])

I suggest you add some print statements to better understand how this works.我建议您添加一些打印语句以更好地理解其工作原理。 eg print(gpa)例如print(gpa)

I would approach this problem in 4 steps我会分 4 个步骤解决这个问题

  1. Read StudentsMajorsList.csv -> data ( {row[0]: row} )阅读 StudentsMajorsList.csv -> 数据( {row[0]: row}
  2. Remove those empty last columns in the rows删除行中那些空的最后一列
  3. Read GPAList.csv and update data读取 GPAList.csv 并更新数据
  4. Write to FullRoster.csv写入 FullRoster.csv
import csv
import pprint

# Step 1: Read StudentsMajorsList.csv into data
with open("StudentsMajorsList.csv") as stream:
    reader = csv.reader(stream)
    data = {row[0]: row for row in reader}

# Step 2: Remove those empty last columns
for row in data.values():
    if row[-1] == "":
        del row[-1]

# Step 3: read GPAList.csv and update data
with open("GPAList.csv") as stream:
    reader = csv.reader(stream)
    for student_id, gpa in reader:
        if student_id in data:
            data[student_id].append(gpa)

# Step 4: Write to FullRoster.csv
with open("FullRoster.csv", "w") as stream:
    writer = csv.writer(stream)
    writer.writerows(data.values())

Note: Step 2 is so that the output matches your expected output, but it will result in inconsistent data, meaning some row will have 5, while others will have 6 columns.注意:第 2 步是为了使 output 与您预期的 output 匹配,但这会导致数据不一致,这意味着某些行将有 5,而其他行将有 6 列。 For this reason, if you want a consistent data, delete step 2.因此,如果您想要一致的数据,请删除步骤 2。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM