[英]Conditional replace value in pandas dataframe with calculated value
I am struggling with the following. 我正在努力解决以下问题。 I have a dataframe which has concentration values, which can be below detection limit (in this example <100 or <200) 我有一个浓度值可以低于检测极限的数据框(在此示例中为<100或<200)
df2 = DataFrame({"site":['site1','site2','site3','site4'],
"concentration":[12000,2000,'<100','<200']})
In order to plot the values, I'd like to replace the values <100 with 0.5 x the detection limit. 为了绘制这些值,我想用0.5 x检测极限替换<100。 So <100 becomes 50; 所以<100变成50; <200 becomes . <200变为。 The code should then add a column TPH< to indicate which sites are below the detection limit. 然后,代码应添加一列TPH <来指示哪些位置低于检测极限。
Any help is much appreciated 任何帮助深表感谢
Create a mask to find elements with <
, index with loc
, and update - 创建一个掩码以查找具有<
,具有loc
索引和loc
元素
m = df2.concentration.astype(str).str.contains('<')
df2.loc[m, 'concentration'] = \
pd.to_numeric(df2.loc[m, 'concentration'].str.lstrip('<'), errors='coerce') / 2
df2
concentration site
0 12000 site1
1 2000 site2
2 50 site3
3 100 site4
Furthermore, m
records rows under the detection limit. 此外, m
记录了检测极限以下的行。
m
0 False
1 False
2 True
3 True
Name: concentration, dtype: bool
Assign it to df2
- 将其分配给df2
df2['TPH<'] = m
df2
concentration site TPH<
0 12000 site1 False
1 2000 site2 False
2 50 site3 True
3 100 site4 True
Keep in mind concentration
is an object column. 请记住, concentration
是一个对象列。 I'd recommend a conversion to numeric - 我建议转换为数字-
df2.concentration = df2.astype(float)
Or, 要么,
df2.concentration = pd.to_numeric(df2.concentration, errors='coerce')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.