I am struggling with the following. I have a dataframe which has concentration values, which can be below detection limit (in this example <100 or <200)
df2 = DataFrame({"site":['site1','site2','site3','site4'],
"concentration":[12000,2000,'<100','<200']})
In order to plot the values, I'd like to replace the values <100 with 0.5 x the detection limit. So <100 becomes 50; <200 becomes . The code should then add a column TPH< to indicate which sites are below the detection limit.
Any help is much appreciated
Create a mask to find elements with <
, index with loc
, and update -
m = df2.concentration.astype(str).str.contains('<')
df2.loc[m, 'concentration'] = \
pd.to_numeric(df2.loc[m, 'concentration'].str.lstrip('<'), errors='coerce') / 2
df2
concentration site
0 12000 site1
1 2000 site2
2 50 site3
3 100 site4
Furthermore, m
records rows under the detection limit.
m
0 False
1 False
2 True
3 True
Name: concentration, dtype: bool
Assign it to df2
-
df2['TPH<'] = m
df2
concentration site TPH<
0 12000 site1 False
1 2000 site2 False
2 50 site3 True
3 100 site4 True
Keep in mind concentration
is an object column. I'd recommend a conversion to numeric -
df2.concentration = df2.astype(float)
Or,
df2.concentration = pd.to_numeric(df2.concentration, errors='coerce')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.