简体   繁体   中英

How to remove only numbers from a string in Pandas columns

I'm an environmental geologist and I'm just learning Python/Pandas. I have a dataframe of analytical data in Pandas similar to the example below:

起始数据框

I only want to remove numbers from Total_dl leaving the detection limits (numbers with <). This would be the final dataframe I'm looking for:

我正在寻找的最终数据框

Since the column is strings I'm not sure how to parse the column. Any help would be appreciated.

Thanks

The following should do the trick:

import numpy as np


mask = df.Total_dll < 1.
df.loc[mask, 'Total_dll'] = np.nan

If Total_dll is of type string you can try the following:

import numpy as np


df.str.startswith('<')
df.loc[df.Total_dll.str.startswith('<'), np.nan]

One way to do it. Not sure how good a solution it is:

df['Total_dl'] = df['Total_dl'].apply(lambda o: o if '<' in str(o) else np.nan)

Using a function that does the same instead:

>>> df
   SampleID Total_dl
0    A-1-0'      2.5
1  A-1-0.5'   <0.021
>>> df.dtypes
SampleID    object
Total_dl    object
dtype: object
>>> def foo(o):
...     if '<' in str(o):
...         return o
...     else:
...         return np.nan
...         
>>> df['Total_dl'] = df['Total_dl'].apply(foo)
>>> df
   SampleID Total_dl
0    A-1-0'      NaN
1  A-1-0.5'   <0.021
>>> 

Say your data frame is called df , then this will do the trick

import numpy as np
nan_condition = df[~df["Total_dl"].str.contains(">")]
df.loc[nan_condition,"Total_dl"] = np.nan

你可以用这个


data = data.loc[data[column] > x]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM