I have the following 3 dataframes with shape of (8004,29) and the following schema as an example:
id var0 var1 var2 var3 var4 ... var29
5171 10.0 2.8 0.0 5.0 1.0 ... 9.4
5171 40.9 2.5 3.4 4.5 1.3 ... 7.7
5171 60.7 3.1 5.2 6.6 3.4 ... 1.0
...
5171 0.5 1.3 5.1 0.5 0.2 ... 0.4
4567 1.5 2.0 1.0 4.5 0.1 ... 0.4
4567 4.4 2.0 1.3 6.4 0.1 ... 3.3
4567 6.3 3.0 1.5 7.6 1.6 ... 1.6
...
4567 0.7 1.4 1.4 0.3 4.2 ... 1.7
...
9584 0.3 2.6 0.0 5.2 1.6 ... 9.7
9584 0.5 1.2 8.3 3.4 1.3 ... 1.7
9584 0.7 3.0 5.6 6.6 3.0 ... 1.0
...
9584 0.7 1.3 0.1 0.0 2.0 ... 1.7
where each id
has 58 elements or rows and there are 138 unique id
s.
I am only interested in the last column of these dataframes: column var29
. What i need to do is the following comparison:
if df1['var29'] > (df2['var29'] + df3['var29']) or
df1['var29'] < (df2['var29'] - df3['var29'])
and generate a new dataframe as a result:
id result
5171 True
5171 True
5171 False
...
5171 False
4567 True
4567 True
4567 True
...
4567 False
...
9584 True
9584 False
9584 False
...
9584 True
I tried to loop over each index and use lamda to generate result dataframe as follow but it failed:
idxs = unique(df1.index).tolist()
results = pd.DataFrame(index=df1.index)
for idx in idxs:
results['result'] = df1.loc[idx]['var29'].apply(lambda x: True if (
(df2['var29'].loc[idx] - df3['var29'].loc[idx]) > x or (
df2['var29'].loc[idx] + df3['var29'].loc[idx]) < x) else False)
Can someone help me to generate it?
Here's a way to do it, we map the columns into a single dataframe to ensure the ids are mapped correctly:
# create a new data
new_df = df1.copy()
new_df['df2'] = new_df['id'].map(df2.set_index('id')['var29'])
new_df['df3'] = new_df['id'].map(df3.set_index('id')['var29'])
# use conditions
cond = (new_df['var29'] > (new_df['df2'] + new_df['df3'])) | (new_df['var29'] < (new_df['df2'] - new_df['df3']))
new_df['result'] = np.where(cond, True, False)
#choose columns
new_df = new_df[['id','result']]
Sample Data
df1 = pd.DataFrame({'id': list(range(10)),'var29': np.random.randn(10)})
df2 = pd.DataFrame({'id': list(range(10)), 'var29': np.random.randn(10)})
df3 = pd.DataFrame({'id': list(range(10)), 'var29': np.random.randn(10)})
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.