[英]How to merge different CSV file into a new CSV with one primary key
I have two huge CSV file and want them to join in one new CSV file with using python pandas, the primary key is id_student, it is ok that I successfully join different column together but when I output to a new CSV file, the whole bunch of data will only exist to the first row, different column, for example, the row 1 column 1 will be id_student, it is like:我有两个巨大的 CSV 文件,并希望它们使用 python pandas 加入一个新的 CSV 文件,主键是 id_student,我可以成功地将不同的列连接在一起,但是当我输出到一个新的 CSV 文件时,整个文件数据将只存在于第一行,不同的列,例如,第1行第1列将是id_student,就像:
0 12345
1 12344
then row 1 column will be final_result, the format will like:那么第 1 列将是 final_result,格式如下:
0 Pass
1 Pass
but my expected output will be like :但我的预期输出将是:
0 12345 Pass
1 12344 Pass
Is there any way I can fix the output format?有什么办法可以修复输出格式吗?
def plotlyGraph(self):
df = pandas.read_csv('studentAssessment.csv')
dc = pandas.read_csv('studentInfo.csv')
res = pandas.merge(df,dc, on=['id_student'], how='outer')
a=res['id_student']
b=res['final_result']
c=res['score']
d=res['id_assessment']
e=res['region']
with open("new.csv", "w", newline="") as csvfile:
writer = csv.writer(csvfile)
writer.writerow([a,b,c,d,e])
I am assuming your df
has 2 columns: id_student
and id_assessment
, while the dc
has 2 columns: id_student
and final_result
.我假设您的df
有 2 列: id_student
和id_assessment
,而dc
有 2 列: id_student
和final_result
。 Try this one:试试这个:
df = pandas.read_csv('studentAssessment.csv')
dc = pandas.read_csv('studentInfo.csv')
res = df.merge(dc, on=['id_student'], how='outer')
print(res)
Output输出
id_student id_assessment final_result
0 0 12345 pass
1 1 12344 pass
To store in csv
file:要存储在csv
文件中:
res.to_csv("new.csv", index=False)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.