简体   繁体   English

大熊猫数据框中的条件替换值与计算值

[英]Conditional replace value in pandas dataframe with calculated value

I am struggling with the following. 我正在努力解决以下问题。 I have a dataframe which has concentration values, which can be below detection limit (in this example <100 or <200) 我有一个浓度值可以低于检测极限的数据框(在此示例中为<100或<200)

df2 = DataFrame({"site":['site1','site2','site3','site4'],
                 "concentration":[12000,2000,'<100','<200']})

In order to plot the values, I'd like to replace the values <100 with 0.5 x the detection limit. 为了绘制这些值,我想用0.5 x检测极限替换<100。 So <100 becomes 50; 所以<100变成50; <200 becomes . <200变为。 The code should then add a column TPH< to indicate which sites are below the detection limit. 然后,代码应添加一列TPH <来指示哪些位置低于检测极限。

Any help is much appreciated 任何帮助深表感谢

Create a mask to find elements with < , index with loc , and update - 创建一个掩码以查找具有< ,具有loc索引和loc元素

m = df2.concentration.astype(str).str.contains('<')
df2.loc[m, 'concentration'] = \
      pd.to_numeric(df2.loc[m, 'concentration'].str.lstrip('<'), errors='coerce') / 2

df2

  concentration   site
0         12000  site1
1          2000  site2
2            50  site3
3           100  site4

Furthermore, m records rows under the detection limit. 此外, m记录了检测极限以下的行。

m

0    False
1    False
2     True
3     True
Name: concentration, dtype: bool

Assign it to df2 - 将其分配给df2

df2['TPH<'] = m
df2

  concentration   site   TPH<
0         12000  site1  False
1          2000  site2  False
2            50  site3   True
3           100  site4   True

Keep in mind concentration is an object column. 请记住, concentration是一个对象列。 I'd recommend a conversion to numeric - 我建议转换为数字-

df2.concentration = df2.astype(float)

Or, 要么,

df2.concentration = pd.to_numeric(df2.concentration, errors='coerce')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM