简体   繁体   English

Python 合并两个 csv 文件 python

[英]Python merge two csv files python

I have 2 CSV files.我有 2 个 CSV 文件。

File1.csv文件1.csv

    Frame_Nr; Data1; Data2; Labeled
    0          0       1        1
    1          0       0        1
    2          1       1        1
    3          0       0        0
    4          0       0        0
    5          1       0        1
    6          0       0        0
    7          0       0        0
   11          0       1        1
   12          1       1        1

File2.csv文件2.csv

Frame_Nr; Data1; Data2; Labeled
    0          0       0        0
    1          0       0        0
    2          0       0        0
    3          0       0        0
    4          0       0        0
    5          0       0        0
    6          0       0        0
    7          0       0        0
    8          0       0        0
    9          0       0        0
   10          0       0        0

I want the output to look something like this.我希望 output 看起来像这样。 And should merge file2.csv with file file1.csv and if there are some changes to replace with data from file1.csv else to keep data from file2.csv And should merge file2.csv with file file1.csv and if there are some changes to replace with data from file1.csv else to keep data from file2.csv

Expected output.csv预期output.csv

    Frame_Nr; Data1; Data2; Labeled
    0          0       1        1
    1          0       0        1
    2          1       1        1
    3          0       0        0
    4          0       0        0
    5          1       0        1
    6          0       0        0
    7          0       0        0
    8          0       0        0
    9          0       0        0
   10          0       0        0
   11          0       1        1
   12          1       1        1

My code:我的代码:

import csv
import os

f = open('file2', 'r')
reader = csv.reader(f, delimiter=';')   
reader = list(reader)
f1 = open('file1', 'r')
reader1 = csv.reader(f1, delimiter=';')
next(reader1)
reader1 = list(reader1)


for line1 in reader1:
    for line in reader:
        if line1[0] != line[0]:
            print(line1)
        else:
            print(line)

Pandas has two very nice functions to help you avoid a nested for loop and make the process more efficient: Pandas 有两个非常好的函数来帮助您避免嵌套的 for 循环并使过程更高效:

import pandas as pd
df1 = pd.read_csv('file1.csv', options='whatever makes your csvs load')
df2 = pd.read_csv('file2.csv', options='whatever makes your csvs load')
df = pd.concat([df1, df2]).drop_duplicates('Frame_Nr')

Optionally, if you want the resulting DataFrame sorted by Frame_Nr , chain a .sort_values('Frame_Nr') to the last line或者,如果您希望生成的DataFrame Frame_Nr ,请将.sort_values('Frame_Nr')到最后一行

To explain the code snippet: pd.concat concatenates both DataFrames so that you first have all rows from file 1 and after that all rows from file 2, the drop_duplicates after that removes all rows with duplicate values in Frame_Nr , keeping the first.为了解释代码片段: pd.concat连接两个 DataFrame 以便您首先拥有文件 1 中的所有行,然后是文件 2 中的所有行,之后的drop_duplicates删除Frame_Nr中具有重复值的所有行,保留第一个。 Since file1 was the first file in the concatenation, all lines from that file are kept and lines from file2 are only retained if they have a frame number that was not in file1 .由于file1是串联中的第一个文件,因此该文件中的所有行都将保留,而file2中的行仅在它们具有不在file1中的帧号时才保留。 Optionally, the sort_values will sort the DataFrame by the frame number column或者, sort_values将按帧号列对 DataFrame 进行排序

import pandas as pd

df1 = pd.read_csv("file1.csv", delim_whitespace=True)
df2 = pd.read_csv("file2.csv", delim_whitespace=True)

df=pd.concat([df1, df2]).drop_duplicates('Frame_Nr;').sort_values("Frame_Nr;")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM