简体   繁体   English

"Python Pandas - 删除重复项"

[英]Python Pandas - Drop Duplicates

I have a use case where I read 2 csv files and then drop duplicates based on a column value.我有一个用例,我读取 2 个 csv 文件,然后根据列值删除重复项。 My code is as below:我的代码如下:

input_path = "data1.csv"
df_v1 = pd.read_csv(input_path)
print(len(df_v1))

input_path2 = "data2.csv"
df_v2 = pd.read_csv(input_path2)
print(len(df_v2))

result = df_v1.append(df_v2, ignore_index=True)
result.drop_duplicates(subset = ['Time'], keep = 'first', inplace = True)

result.to_csv('output.csv', encoding='utf-8', index=False)

试试这个:

res = pd.concat([df1, df2], ignore_index=True).drop_duplicates(subset = ['Time'], keep = 'first', inplace = True)

Wild guess without seeing the actual data, but one possibility is that the time columns are different on some small value.在没有看到实际数据的情况下进行疯狂猜测,但一种可能性是时间列在一些小值上有所不同。 Below the time is converted to a timestamp with 1 second rounding:下面的时间被转换为一个以 1 秒四舍五入的时间戳:

result = df_v1.append(df_v2, ignore_index=True)
result['Time'] = pd.to_datetime(result['Time']).dt.round('1s')
result.drop_duplicates(subset = ['Time'], keep = 'first', inplace = True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM