[英]How to populate columns of a dataframe using a subset of another dataframe?
我有兩個像這樣的數據幀
import pandas as pd
import numpy as np
df1 = pd.DataFrame({
'key': list('AAABBCCAAC'),
'prop1': list('xyzuuyxzzz'),
'prop2': list('mnbnbbnnnn')
})
df2 = pd.DataFrame({
'key': list('ABBCAA'),
'prop1': [np.nan] * 6,
'prop2': [np.nan] * 6,
'keep_me': ['stuff'] * 6
})
key prop1 prop2
0 A x m
1 A y n
2 A z b
3 B u n
4 B u b
5 C y b
6 C x n
7 A z n
8 A z n
9 C z n
key prop1 prop2 keep_me
0 A NaN NaN stuff
1 B NaN NaN stuff
2 B NaN NaN stuff
3 C NaN NaN stuff
4 A NaN NaN stuff
5 A NaN NaN stuff
我現在想要使用df1
的值在df2
填充prop1
和prop2
列。 對於每個鍵,我們將在df1
比在df2
更多或相等的行(在上面的示例中:5次A
對3次A
,2次B
對2次B
和3次C
對1次C
)。 對於每個鍵,我想使用df1
每個鍵的前n
行填充df2
。
所以,我對df2
預期結果是:
key prop1 prop2 keep_me
0 A x m stuff
1 B u n stuff
2 B u b stuff
3 C y b stuff
4 A y n stuff
5 A z b stuff
由於key
不是唯一的,我不能簡單地構建字典然后使用.map
。
我希望沿着這些方向發揮作用:
pd.concat([df2.set_index('key'), df1.set_index('key')], axis=1, join='inner')
但那失敗了
ValueError:傳遞值的形狀是(5,22),索引暗示(5,10)
as - 我想 - 索引包含非唯一值。
如何獲得所需的輸出?
因為在重復key
值可能的解決方案是在兩個創建新的計數器列DataFrame
S按GroupBy.cumcount
,所以可能替換缺失從值df2
與由對准MultiIndex
通過創建key
和g
列與DataFrame.fillna
:
df1['g'] = df1.groupby('key').cumcount()
df2['g'] = df2.groupby('key').cumcount()
print (df1)
key prop1 prop2 g
0 A x m 0
1 A y n 1
2 A z b 2
3 B u n 0
4 B u b 1
5 C y b 0
6 C x n 1
7 A z n 3
8 A z n 4
9 C z n 2
print (df2)
key prop1 prop2 keep_me g
0 A NaN NaN stuff 0
1 B NaN NaN stuff 0
2 B NaN NaN stuff 1
3 C NaN NaN stuff 0
4 A NaN NaN stuff 1
5 A NaN NaN stuff 2
df = (df2.set_index(['key','g'])
.fillna(df1.set_index(['key','g']))
.reset_index(level=1, drop=True)
.reset_index())
print (df)
key prop1 prop2 keep_me
0 A x m stuff
1 B u n stuff
2 B u b stuff
3 C y b stuff
4 A y n stuff
5 A z b stuff
另一種解決方案是首先從df1構建一個dict,然后彈出元素以填充df2中的NA
d = df1.groupby(by='key').apply(lambda x: x.values.tolist()).to_dict()
df2[['key','prop1','prop2']] = pd.DataFrame(df2.key.apply(lambda x: d[x].pop(0)).tolist())
key prop1 prop2 keep_me
0 A x m stuff
1 B u n stuff
2 B u b stuff
3 C y b stuff
4 A y n stuff
5 A z b stuff
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.