简体   繁体   中英

Python Pandas - Drop Duplicates

I have a use case where I read 2 csv files and then drop duplicates based on a column value. My code is as below:

input_path = "data1.csv"
df_v1 = pd.read_csv(input_path)
print(len(df_v1))

input_path2 = "data2.csv"
df_v2 = pd.read_csv(input_path2)
print(len(df_v2))

result = df_v1.append(df_v2, ignore_index=True)
result.drop_duplicates(subset = ['Time'], keep = 'first', inplace = True)

result.to_csv('output.csv', encoding='utf-8', index=False)

试试这个:

res = pd.concat([df1, df2], ignore_index=True).drop_duplicates(subset = ['Time'], keep = 'first', inplace = True)

Wild guess without seeing the actual data, but one possibility is that the time columns are different on some small value. Below the time is converted to a timestamp with 1 second rounding:

result = df_v1.append(df_v2, ignore_index=True)
result['Time'] = pd.to_datetime(result['Time']).dt.round('1s')
result.drop_duplicates(subset = ['Time'], keep = 'first', inplace = True)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM