[英]Python merge two csv files python
I have 2 CSV files.我有 2 个 CSV 文件。
File1.csv文件1.csv
Frame_Nr; Data1; Data2; Labeled
0 0 1 1
1 0 0 1
2 1 1 1
3 0 0 0
4 0 0 0
5 1 0 1
6 0 0 0
7 0 0 0
11 0 1 1
12 1 1 1
File2.csv文件2.csv
Frame_Nr; Data1; Data2; Labeled
0 0 0 0
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 0
10 0 0 0
I want the output to look something like this.我希望 output 看起来像这样。 And should merge file2.csv with file file1.csv and if there are some changes to replace with data from file1.csv else to keep data from file2.csv And should merge file2.csv with file file1.csv and if there are some changes to replace with data from file1.csv else to keep data from file2.csv
Expected output.csv预期output.csv
Frame_Nr; Data1; Data2; Labeled
0 0 1 1
1 0 0 1
2 1 1 1
3 0 0 0
4 0 0 0
5 1 0 1
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 0
10 0 0 0
11 0 1 1
12 1 1 1
My code:我的代码:
import csv
import os
f = open('file2', 'r')
reader = csv.reader(f, delimiter=';')
reader = list(reader)
f1 = open('file1', 'r')
reader1 = csv.reader(f1, delimiter=';')
next(reader1)
reader1 = list(reader1)
for line1 in reader1:
for line in reader:
if line1[0] != line[0]:
print(line1)
else:
print(line)
Pandas has two very nice functions to help you avoid a nested for loop and make the process more efficient: Pandas 有两个非常好的函数来帮助您避免嵌套的 for 循环并使过程更高效:
import pandas as pd
df1 = pd.read_csv('file1.csv', options='whatever makes your csvs load')
df2 = pd.read_csv('file2.csv', options='whatever makes your csvs load')
df = pd.concat([df1, df2]).drop_duplicates('Frame_Nr')
Optionally, if you want the resulting DataFrame
sorted by Frame_Nr
, chain a .sort_values('Frame_Nr')
to the last line或者,如果您希望生成的DataFrame
Frame_Nr
,请将.sort_values('Frame_Nr')
到最后一行
To explain the code snippet: pd.concat
concatenates both DataFrames so that you first have all rows from file 1 and after that all rows from file 2, the drop_duplicates
after that removes all rows with duplicate values in Frame_Nr
, keeping the first.为了解释代码片段: pd.concat
连接两个 DataFrame 以便您首先拥有文件 1 中的所有行,然后是文件 2 中的所有行,之后的drop_duplicates
删除Frame_Nr
中具有重复值的所有行,保留第一个。 Since file1
was the first file in the concatenation, all lines from that file are kept and lines from file2
are only retained if they have a frame number that was not in file1
.由于file1
是串联中的第一个文件,因此该文件中的所有行都将保留,而file2
中的行仅在它们具有不在file1
中的帧号时才保留。 Optionally, the sort_values
will sort the DataFrame by the frame number column或者, sort_values
将按帧号列对 DataFrame 进行排序
import pandas as pd
df1 = pd.read_csv("file1.csv", delim_whitespace=True)
df2 = pd.read_csv("file2.csv", delim_whitespace=True)
df=pd.concat([df1, df2]).drop_duplicates('Frame_Nr;').sort_values("Frame_Nr;")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.