Python Pandas - Drop Duplicates

Question

I have a use case where I read 2 csv files and then drop duplicates based on a column value. My code is as below:

input_path = "data1.csv"
df_v1 = pd.read_csv(input_path)
print(len(df_v1))

input_path2 = "data2.csv"
df_v2 = pd.read_csv(input_path2)
print(len(df_v2))

result = df_v1.append(df_v2, ignore_index=True)
result.drop_duplicates(subset = ['Time'], keep = 'first', inplace = True)

result.to_csv('output.csv', encoding='utf-8', index=False)

Answer 1

试试这个：

res = pd.concat([df1, df2], ignore_index=True).drop_duplicates(subset = ['Time'], keep = 'first', inplace = True)

Answer 2

Wild guess without seeing the actual data, but one possibility is that the time columns are different on some small value. Below the time is converted to a timestamp with 1 second rounding:

result = df_v1.append(df_v2, ignore_index=True)
result['Time'] = pd.to_datetime(result['Time']).dt.round('1s')
result.drop_duplicates(subset = ['Time'], keep = 'first', inplace = True)

Python Pandas - Drop Duplicates

Question

2 answers

solution1
0 2022-02-06 11:39:33

solution2
0 2022-02-06 12:04:56

Python Pandas - Drop Duplicates

Question

2 answers

solution1 0 2022-02-06 11:39:33

solution2 0 2022-02-06 12:04:56

solution1
0 2022-02-06 11:39:33

solution2
0 2022-02-06 12:04:56