[英]pandas vlookup based on conditions
I have two dataframes as shown below: 我有两个数据框,如下所示:
df1: df1:
Cell NodeName conc Delta
S1C1 B4MU1241 B4MU1241;S1C1 0.2
S2C1 B4MU1241 B4MU1241;S2C1 0.2
S3C1 B4MU1241 B4MU1241;S3C1 1
S4C1 B4MU1241 B4MU1241;S4C1 11.1
S1C1 B4MU1702 B4MU1702;S1C1 0.2
S1C2 B4MU1702 B4MU1702;S1C2 0.2
S2C1 B4MU1702 B4MU1702;S2C1 0.1
S2C2 B4MU1702 B4MU1702;S2C2 0
S3C1 B4MU1702 B4MU1702;S3C1 0.1
S3C2 B4MU1702 B4MU1702;S3C2 0.2
S4C1 B4MU1702 B4MU1702;S4C1 0.1
S4C2 B4MU1702 B4MU1702;S4C2 0.1
df2: df2:
Cell NodeName conc Temparature-DUW Delta
S1C1; B4MU1241 B4MU1241;S1C1 60C
S2C1; B4MU1241 B4MU1241;S2C1 60C
S3C1; B4MU1241 B4MU1241;S3C1 60C
S4C1; B4MU1241 B4MU1241;S4C1 60C
S1C1;S1C2; B4MU1702 B4MU1702;S1C1;S1C2 56C
S2C1;S2C2; B4MU1702 B4MU1702;S2C1;S2C2 56C
S3C1;S3C2; B4MU1702 B4MU1702;S3C1;S3C2 56C
S4C1;S4C2; B4MU1702 B4MU1702;S4C1;S4C2 56C
Now I want fill the column "Delta" in df2 such that the output should be : 现在,我要填充df2中的“ Delta”列,以使输出应为:
Cell NodeName conc Temparature-DUW Delta
S1C1; B4MU1241 B4MU1241;S1C1 60C 0.2
S2C1; B4MU1241 B4MU1241;S2C1 60C 0.2
S3C1; B4MU1241 B4MU1241;S3C1 60C 1
S4C1; B4MU1241 B4MU1241;S4C1 60C 11.1
S1C1;S1C2; B4MU1702 B4MU1702;S1C1;S1C2 56C 0.2, 0.2
S2C1;S2C2; B4MU1702 B4MU1702;S2C1;S2C2 56C 0.1,0
S3C1;S3C2; B4MU1702 B4MU1702;S3C1;S3C2 56C 0.1,0.2
S4C1;S4C2; B4MU1702 B4MU1702;S4C1;S4C2 56C 0.1,0.1
i have tried something like this: 我已经尝试过这样的事情:
df1.loc[df1.apply(lambda row: row.conc in [df2.conc.values], axis=1),
df1['Delta']] = df1['Delta']+df2['Delta']
its giving me error 它给我错误
ValueError: ('The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()', 'occurred at index 0')
ValueError :(“具有多个元素的数组的真值是不明确的。使用a.any()或a.all()”,“发生在索引0”)
You can create a mapping series via set_index
and then use a custom function via pd.Series.apply
. 你可以通过创建一个映射一系列
set_index
,然后通过使用自定义函数pd.Series.apply
。 This isn't efficient, but neither is holding comma-separated strings representing numeric data. 这不是很有效,但是也没有保存用逗号分隔的代表数字数据的字符串。
Note that f-strings require Python 3.6+, you can use str.format
instead if necessary. 请注意,f字符串需要Python 3.6+,如果需要,可以改用
str.format
。
d = df1.set_index('conc')['Delta'].to_dict()
def get_vals(x):
pre, *post = x.split(';')
return ', '.join([str(d[f'{pre};{suffix}']) for suffix in post])
df2['Delta'] = df2['conc'].apply(get_vals)
print(df2[['conc', 'Delta']])
conc Delta
0 B4MU1241;S1C1 0.2
1 B4MU1241;S2C1 0.2
2 B4MU1241;S3C1 1.0
3 B4MU1241;S4C1 11.1
4 B4MU1702;S1C1;S1C2 0.2, 0.2
5 B4MU1702;S2C1;S2C2 0.1, 0.0
6 B4MU1702;S3C1;S3C2 0.1, 0.2
7 B4MU1702;S4C1;S4C2 0.1, 0.1
Here's another approach: 这是另一种方法:
mapping = df1[['conc', 'Delta']].set_index('conc')['Delta'].to_dict()
df2['Delta'] = df2['conc'].apply(lambda x: [mapping[';'.join((x.split(';')[0], i))] for i in x.split(';')[1:]])
df2
# Cell NodeName Temparature-DUW conc Delta
#0 S1C1; B4MU1241 60C B4MU1241;S1C1 [0.2]
#1 S2C1; B4MU1241 60C B4MU1241;S2C1 [0.2]
#2 S3C1; B4MU1241 60C B4MU1241;S3C1 [1.0]
#3 S4C1; B4MU1241 60C B4MU1241;S4C1 [11.1]
#4 S1C1;S1C2; B4MU1702 56C B4MU1702;S1C1;S1C2 [0.2, 0.2]
#5 S2C1;S2C2; B4MU1702 56C B4MU1702;S2C1;S2C2 [0.1, 0.0]
#6 S3C1;S3C2; B4MU1702 56C B4MU1702;S3C1;S3C2 [0.1, 0.2]
#7 S4C1;S4C2; B4MU1702 56C B4MU1702;S4C1;S4C2 [0.1, 0.1]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.