[英]Map values in dataframe based on condition using a nested dictionary
我有以下字典
dict_map = {
'Anti' : {'Drug':('A','B','C')},
'Undef': {'Drug':'D','Name':'Type X'},
'Vit ' : {'Name': 'Vitamin C'},
'Placebo Effect' : {'Name':'Placebo', 'Batch':'XYZ'},
}
和數據框
df = pd.DataFrame(
{
'ID': ['AB01', 'AB02', 'AB03', 'AB04', 'AB05','AB06'],
'Drug': ["A","B","A",np.nan,"D","D"],
'Name': ['Placebo', 'Vitamin C', np.nan, 'Placebo', '', 'Type X'],
'Batch' : ['ABC',np.nan,np.nan,'XYZ',np.nan,np.nan],
}
我必須創建一個新列,它將使用列表中指定的列的數據來填充
cols_to_map = ["Drug", "Name", "Batch"]
最終結果應該是這樣的
請注意,“結果”列的前 3 行填充了“抗”,盡管有“維生素 C”,而“安慰劑”是列“名稱”,這是因為“抗”在字典中排在第一位。 我如何使用 python 實現這一點? dict_map 可以以任何方式重構以滿足這個結果。 我不是 python 專業人士,我真的很感激一些幫助。
首先為嵌套字典中元組的單獨值重塑嵌套字典:
from collections import defaultdict
d = defaultdict(dict)
for k, v in dict_map.items():
for k1, v1 in v.items():
if isinstance(v1, tuple):
for x in v1:
d[k1][x] = k
else:
d[k1][v1] = k
print (d)
defaultdict(<class 'dict'>, {'Drug': {'A': 'Anti', 'B': 'Anti',
'C': 'Anti', 'D': 'Undef'},
'Name': {'Type X': 'Undef', 'Vitamin C': 'Vit ',
'Placebo': 'PPL'}})
df = pd.DataFrame(
{
'ID': ['AB01', 'AB02', 'AB03', 'AB04', 'AB05','AB06'],
'Drug': ["A","B","A",np.nan,
"D","D"],
'Name': ['Placebo', 'Vitamin C', np.nan, 'Placebo', '', 'Type X']
}
)
然后按字典映射,按列的順序排列cols_to_map
:
cols_to_map = ["Drug", "Name"]
df['Result'] = np.nan
for col in cols_to_map:
df['Result'] = df['Result'].combine_first(df[col].map(d[col]))
print (df)
ID Drug Name Result
0 AB01 A Placebo Anti
1 AB02 B Vitamin C Anti
2 AB03 A NaN Anti
3 AB04 NaN Placebo PPL
4 AB05 D Undef
5 AB06 D Type X Undef
cols_to_map = [ "Name","Drug"]
df['Result'] = np.nan
for col in cols_to_map:
df['Result'] = df['Result'].combine_first(df[col].map(d[col]))
print (df)
ID Drug Name Result
0 AB01 A Placebo PPL
1 AB02 B Vitamin C Vit
2 AB03 A NaN Anti
3 AB04 NaN Placebo PPL
4 AB05 D Undef
5 AB06 D Type X Undef
編輯:
df['Result1'] = df['Drug'].map(d['Drug'])
df['Result2'] = df['Name'].map(d['Name'])
print (df)
ID Drug Name Result1 Result2
0 AB01 A Placebo Anti PPL
1 AB02 B Vitamin C Anti Vit
2 AB03 A NaN Anti NaN
3 AB04 NaN Placebo NaN PPL
4 AB05 D Undef NaN
5 AB06 D Type X Undef Undef
由於 dict 和預期結果之間的關系非常復雜,我將使用一個函數來應用您的 DataFrame。 這使我們免於操作字典:
def get_result(row):
result = np.nan
for k,v in dict_map.items():
if row['Name'] in v.values():
result = k
if row['Name'] and type(row['Drug']) == str and 'Drug' in v.keys() and row['Drug'] in v['Drug']:
return k
return result
df['Result'] = df.apply(lambda row: get_result(row), axis=1)
print(df)
輸出:
ID Drug Name Result
0 AB01 A Placebo Anti
1 AB02 B Vitamin C Anti
2 AB03 A NaN Anti
3 AB04 NaN Placebo PPL
4 AB05 D NaN
5 AB06 D Type X Undef
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.