简体   繁体   中英

Conditional replace value in pandas dataframe with calculated value

I am struggling with the following. I have a dataframe which has concentration values, which can be below detection limit (in this example <100 or <200)

df2 = DataFrame({"site":['site1','site2','site3','site4'],
                 "concentration":[12000,2000,'<100','<200']})

In order to plot the values, I'd like to replace the values <100 with 0.5 x the detection limit. So <100 becomes 50; <200 becomes . The code should then add a column TPH< to indicate which sites are below the detection limit.

Any help is much appreciated

Create a mask to find elements with < , index with loc , and update -

m = df2.concentration.astype(str).str.contains('<')
df2.loc[m, 'concentration'] = \
      pd.to_numeric(df2.loc[m, 'concentration'].str.lstrip('<'), errors='coerce') / 2

df2

  concentration   site
0         12000  site1
1          2000  site2
2            50  site3
3           100  site4

Furthermore, m records rows under the detection limit.

m

0    False
1    False
2     True
3     True
Name: concentration, dtype: bool

Assign it to df2 -

df2['TPH<'] = m
df2

  concentration   site   TPH<
0         12000  site1  False
1          2000  site2  False
2            50  site3   True
3           100  site4   True

Keep in mind concentration is an object column. I'd recommend a conversion to numeric -

df2.concentration = df2.astype(float)

Or,

df2.concentration = pd.to_numeric(df2.concentration, errors='coerce')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM