[英]Python Dataframes: Merging two dataframes according to a condition (Pandas)
假設我有這兩個 DataFrame:
DATAFRAME 1
onset offset
0 1 200
1 201 400
2 401 600
3 601 800
4 801 1000
5 1001 1200
6 1201 1400
7 1401 1600
8 1601 1800
9 1801 2000
10 2001 2200
11 2201 2400
12 2401 2600
13 2601 2800
14 2801 3000
15 3001 3200
16 3201 3400
17 3401 3600
18 3601 3800
19 3801 4000
20 4001 4200
21 4201 4400
22 4401 4600
23 4601 4800
24 4801 5000
25 5001 5200
26 5201 5400
27 5401 5600
28 5601 5800
29 5801 6000
DATAFRAME 2
onset rhythm_name rhythm_code offset
0 1 NSR 100 2760
1 2761 JUNCTIONAL 4000 3938
2 3939 NSR 100 6000
我的目標是將兩個數據幀與起始偏移間隔合並,並添加它們各自的節奏名稱和節奏代碼以獲得如下內容:
onset offset rhythm_name rhythm_code
0 1 200 NSR 100
1 201 400 NSR 100
2 401 600 NSR 100
3 601 800 NSR 100
4 801 1000 NSR 100
5 1001 1200 NSR 100
6 1201 1400 NSR 100
7 1401 1600 NSR 100
8 1601 1800 NSR 100
9 1801 2000 NSR 100
10 2001 2200 NSR 100
11 2201 2400 NSR 100
12 2401 2600 NSR 100
13 2601 2800 Null Null
14 2801 3000 JUNCTIONAL 4000
15 3001 3200 JUNCTIONAL 4000
16 3201 3400 JUNCTIONAL 4000
17 3401 3600 JUNCTIONAL 4000
18 3601 3800 JUNCTIONAL 4000
19 3801 4000 Null Null
20 4001 4200 NSR 100
21 4201 4400 NSR 100
22 4401 4600 NSR 100
23 4601 4800 NSR 100
24 4801 5000 NSR 100
25 5001 5200 NSR 100
26 5201 5400 NSR 100
27 5401 5600 NSR 100
28 5601 5800 NSR 100
29 5801 6000 NSR 100
我可以用什么來做到這一點? 我找不到解決這個問題的方法。 我試過類似的東西:
df1["rhythm_name"] = df2[(df1['onset'] >= df2['onset']) & (df1['offset'] <= df2['offset'])])
我明白了:
ValueError: Can only compare identically-labeled Series objects
我做了一個腳本來重現這個問題:
df1 = pd.DataFrame()
onsets = []
for i in range(0,30):
onset = i * 200 + 1
onsets.append(onset)
df1['onset'] = onsets
df1['offset'] = df1["onset"]+200-1
df2 = {'onset': [1, 2761, 3939],
'offset': [2760, 3938, 6000],
'rhythm_name': ["NSR", "JUNCTIONAL", "NSR"],
'rhythm_code': [100, 4000, 100]}
您可以pd.merge_asof
,並掩蓋第二個條件:
dfm = pd.merge_asof(df1, df2, on='onset', direction='backward', suffixes=('','_y'))
dfm[['rhythm_name', 'rhythm_code']] = (dfm[['rhythm_name', 'rhythm_code']]
.where(dfm['offset'] <= dfm['offset_y']))
dfm.drop('offset_y', axis=1)
Output:
onset offset rhythm_name rhythm_code
0 1 200 NSR 100.0
1 201 400 NSR 100.0
2 401 600 NSR 100.0
3 601 800 NSR 100.0
4 801 1000 NSR 100.0
5 1001 1200 NSR 100.0
6 1201 1400 NSR 100.0
7 1401 1600 NSR 100.0
8 1601 1800 NSR 100.0
9 1801 2000 NSR 100.0
10 2001 2200 NSR 100.0
11 2201 2400 NSR 100.0
12 2401 2600 NSR 100.0
13 2601 2800 NaN NaN
14 2801 3000 JUNCTIONAL 4000.0
15 3001 3200 JUNCTIONAL 4000.0
16 3201 3400 JUNCTIONAL 4000.0
17 3401 3600 JUNCTIONAL 4000.0
18 3601 3800 JUNCTIONAL 4000.0
19 3801 4000 NaN NaN
20 4001 4200 NSR 100.0
21 4201 4400 NSR 100.0
22 4401 4600 NSR 100.0
23 4601 4800 NSR 100.0
24 4801 5000 NSR 100.0
25 5001 5200 NSR 100.0
26 5201 5400 NSR 100.0
27 5401 5600 NSR 100.0
28 5601 5800 NSR 100.0
29 5801 6000 NSR 100.0
如果你的數據不是太大,你可以使用廣播的方式:
cond1 = df1.onset.values[:,None] >= df2.onset.values
cond2 = df1.offset.values[:,None] <= df2.offset.values
mask = (cond1&cond2)
idx = np.where(mask.any(1), mask.argmax(1), np.nan)
for col in ['rhythm_name', 'rhythm_code']:
df1[col] = df2[col].reindex(idx).values
Output:
0 1 200 NSR 100.0
1 201 400 NSR 100.0
2 401 600 NSR 100.0
3 601 800 NSR 100.0
4 801 1000 NSR 100.0
5 1001 1200 NSR 100.0
6 1201 1400 NSR 100.0
7 1401 1600 NSR 100.0
8 1601 1800 NSR 100.0
9 1801 2000 NSR 100.0
10 2001 2200 NSR 100.0
11 2201 2400 NSR 100.0
12 2401 2600 NSR 100.0
13 2601 2800 NaN NaN
14 2801 3000 JUNCTIONAL 4000.0
15 3001 3200 JUNCTIONAL 4000.0
16 3201 3400 JUNCTIONAL 4000.0
17 3401 3600 JUNCTIONAL 4000.0
18 3601 3800 JUNCTIONAL 4000.0
19 3801 4000 NaN NaN
20 4001 4200 NSR 100.0
21 4201 4400 NSR 100.0
22 4401 4600 NSR 100.0
23 4601 4800 NSR 100.0
24 4801 5000 NSR 100.0
25 5001 5200 NSR 100.0
26 5201 5400 NSR 100.0
27 5401 5600 NSR 100.0
28 5601 5800 NSR 100.0
29 5801 6000 NSR 100.0
選項 2 :使用merge_asof
的另一種(更好)方法:
(pd.merge_asof(df1,df2,on='onset',direction='backward',suffixes=['','_y'])
.query('offset<=offset_y')
.reindex(df1.index)
.drop('offset_y', axis=1)
.fillna(df1)
)
你得到相同的 output。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.