"Python Pandas - 删除重复项"

Question

I have a use case where I read 2 csv files and then drop duplicates based on a column value.我有一个用例，我读取 2 个 csv 文件，然后根据列值删除重复项。 My code is as below:我的代码如下：

input_path = "data1.csv"
df_v1 = pd.read_csv(input_path)
print(len(df_v1))

input_path2 = "data2.csv"
df_v2 = pd.read_csv(input_path2)
print(len(df_v2))

result = df_v1.append(df_v2, ignore_index=True)
result.drop_duplicates(subset = ['Time'], keep = 'first', inplace = True)

result.to_csv('output.csv', encoding='utf-8', index=False)

Answer 1

试试这个：

res = pd.concat([df1, df2], ignore_index=True).drop_duplicates(subset = ['Time'], keep = 'first', inplace = True)

Answer 2

Wild guess without seeing the actual data, but one possibility is that the time columns are different on some small value.在没有看到实际数据的情况下进行疯狂猜测，但一种可能性是时间列在一些小值上有所不同。 Below the time is converted to a timestamp with 1 second rounding:下面的时间被转换为一个以 1 秒四舍五入的时间戳：

result = df_v1.append(df_v2, ignore_index=True)
result['Time'] = pd.to_datetime(result['Time']).dt.round('1s')
result.drop_duplicates(subset = ['Time'], keep = 'first', inplace = True)

"Python Pandas - 删除重复项"

问题描述

2 个解决方案

解决方案1
0 2022-02-06 11:39:33

解决方案2
0 2022-02-06 12:04:56

"Python Pandas - 删除重复项"

问题描述

2 个解决方案

解决方案1 0 2022-02-06 11:39:33

解决方案2 0 2022-02-06 12:04:56

解决方案1
0 2022-02-06 11:39:33

解决方案2
0 2022-02-06 12:04:56