I have a main dataframe that I want to update periodically with an update frame. The main frame has a column that determines which column in the update column to update from. Currently, I'm able to do it as follows:
import pandas as pd
import numpy as np
##### Test data
# Not unique Name but still index
df_main = pd.DataFrame({
"Name": ["a", "b", "c", "b", "d"],
"Flip": [True, True, False, False, True],
"Value": [1.0, 2.0, 3.0, 2.5, 4.0]
}, columns=["Name", "Flip", "Value"])
df_main.set_index('Name', inplace=True)
# Flip Value
# Name
# a True 1.0
# b True 2.0
# c False 3.0
# b False 2.5
# d True 4.0
# Unique index
df_update_data = pd.DataFrame({
"Name": ["a", "b", "c", "d", "f"],
"Value_True": [1.1, 2.1, 3.1, 4.1, 5.1],
"Value_False": [1.2, 2.2, 3.2, 4.2, 5.2]
}, columns=["Name", "Value_True", "Value_False"])
df_update_data.set_index('Name', inplace=True)
# Value_True Value_False
# Name
# a 1.1 1.2
# b 2.1 2.2
# c 3.1 3.2
# d 4.1 4.2
# f 5.1 5.2
df_main = df_main.join(df_update_data, how='inner')
df_main["Value"] = np.where(df_main['Flip'].values, df_main['Value_True'].values, df_main['Value_False'].values)
df_main = df_main.drop(['Value_True', 'Value_False'], axis=1)
print(df_main)
# Flip Value
# Name
# a True 1.1
# b True 2.1
# b False 2.2
# c False 3.2
# d True 4.1
This is done quite often and I actually have Name_{1,2,3}, Flop_{1,2,3}, Value_{1,2,3} so I'm doing the join, update and drop 3 times. I'm trying to be as efficient as possible as I'm chasing time. Is this the best way to do it? I did not really find a speed improvement using merge rather than join.
Note that your result is sorted on the index, so my solution starts from explicit sorting (on the index).
I think, creation of an intermediate DataFrame is unavoidable. But then you can compute values for Value column and save them just in this column.
I also noticed that how='left' (default) works a bit faster and in your case is also acceptable.
So the code can be:
df_main.sort_index(inplace=True)
wrk = df_main.join(df_update_data)
df_main.Value = np.where(wrk.Flip, wrk.Value_True, wrk.Value_False)
At least you avoid dropping 2 columns.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.