简体   繁体   English

如何使用 pandas 识别字符串数据中的浮点数/数字

[英]How to identify float/numbers in String data using pandas

I have a dataframe like as shown below我有一个 dataframe 如下图所示

df = pd.DataFrame({'val': ['test','depat','23.1','25.0','31',np.nan]})

I would like to crete two new columns val_num and val_string我想创建两个新列val_numval_string

In val_num, I would like to store numeric/int values在 val_num 中,我想存储数字/整数值

In val_string, I would like to store string values在 val_string 中,我想存储字符串值

So, I tried the below所以,我尝试了以下

df['val_num'] = pd.to_numeric(df['val'],errors='coerce')
df['val_string'] = (df[pd.to_numeric(df['val'],errors='coerce').isna()])

Though the above works fine, is there any elegant function like to_numeric for identifying string objects using to_string ?虽然上述工作正常,但是否有任何优雅的 function 像to_numeric用于使用to_string识别字符串对象?

is there any elegant function like to_numeric for identifying string objects using to_string是否有任何优雅的 function 像 to_numeric 用于使用 to_string 识别字符串对象

No, it not exist yet.不,它还不存在。

If values are mixed - it means there ais possible use isinstance method for test it:如果值是混合的 - 这意味着可以使用isinstance方法对其进行测试:

df = pd.DataFrame({'val': ['test','depat',23.1,25.0,31,np.nan]})

df['num'] = df.loc[df['val'].apply(lambda x: isinstance(x, (float, int))), 'val']
df['str'] = df.loc[df['val'].apply(lambda x: isinstance(x, str)), 'val']
print (df)
     val   num    str
0   test   NaN   test
1  depat   NaN  depat
2   23.1  23.1    NaN
3   25.0  25.0    NaN
4     31    31    NaN
5    NaN   NaN    NaN

Unfortuantely in real life all data are strings, so need your solution - first convert to numeric and then processing:不幸的是,在现实生活中所有数据都是字符串,所以需要你的解决方案 - 首先转换为数字然后处理:

df = pd.DataFrame({'val': ['test','depat','23.1','25.0','31',np.nan]})

df['num'] = df.loc[df['val'].apply(lambda x: isinstance(x, float)), 'val']
df['str'] = df.loc[df['val'].apply(lambda x: isinstance(x, str)), 'val']
print (df)
     val  num    str
0   test  NaN   test
1  depat  NaN  depat
2   23.1  NaN   23.1
3   25.0  NaN   25.0
4     31  NaN     31
5    NaN  NaN    NaN

df['num'] = pd.to_numeric(df['val'],errors='coerce')
df['vstring'] = df.loc[df['num'].isna(), 'val']
print (df)
     val   num vstring
0   test   NaN    test
1  depat   NaN   depat
2   23.1  23.1     NaN
3   25.0  25.0     NaN
4     31  31.0     NaN
5    NaN   NaN     NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM