简体   繁体   English

pandas:从数字字符串中提取或拆分char

[英]pandas: extract or split char from number string

I have a dataframe selected from a sql table that looks like this 我有一个从sql表中选择的数据帧,看起来像这样

   id shares_float
0   1      621.76M
1   2      329.51M

in other word, 换句话说,

[(1, '621.76M'), (2, '329.51M')]

I want to split the shares_float so that if it is 'B', multiply 1,000,000,000 and if it is 'M', multiply 1,000,000 and if it is neither or don't have trailing character just convert and assign the number. 我想拆分shares_float,如果它是'B',则乘以1,000,000,000,如果它是'M',则乘以1,000,000,如果它既不是或者没有尾随字符,只需转换并分配数字。

Outcome should be a float type 结果应该是浮动类型

   ticker_id  shares_float     float_value
0          1       621.76M    621760000.00
1          2         3.51B   3510000000.00

I am new to pandas. 我是熊猫新手。 Is there a way to do it in pandas? 有没有办法在熊猫中做到这一点? or should I convert data to list and do my manipulation in a loop and then convert it back to pandas DataFrame? 或者我应该将数据转换为列表并在循环中进行操作然后将其转换回pandas DataFrame?

Note added: The answer works great! 注意补充:答案很有效! Thank you. 谢谢。 BTW, how does the function work? 顺便说一下,这个功能如何运作?

Could use a conversion dictionary, also I am sure you didn't mean 624540000 : 可以使用转换字典,我也相信你并不是指624540000

In [9]:

D={'M':'*1e6', 'B':'*1e9'}
df['float_value']=df.shares_float.apply(lambda x: eval(x[:-1]+D[x[-1]]))
In [10]:

print df
   ticker_id shares_float  float_value
0          1      621.76M   621760000
1          2        3.51B  3510000000

[2 rows x 3 columns]
In [11]:

df.dtypes
Out[11]:
ticker_id         int64
shares_float     object
float_value     float64
dtype: object

you can use string methods to extract the pattern; 你可以使用字符串方法来提取模式; for example, in order to cover all cases, starting with: 例如,为了涵盖所有情况,从以下开始:

>>> df
   id shares_float
0   1            5
1   2           6M
2   3           7B

[3 rows x 2 columns]

the numeric value and unit can be extracted by: 数值和单位可以通过以下方式提取:

>>> sh = df.shares_float.str.extract(r'(?P<val>[0-9.]*)(?P<unit>[MB]{0,1})')
>>> sh
  val unit
0   5
1   6    M
2   7    B

[3 rows x 2 columns]

and then: 然后:

>>> unit_map = {'':1, 'M':1e6, 'B':1e9}
>>> df['float_value'] = sh.val.astype(np.float64) * sh.unit.map(unit_map)
>>> df
   id shares_float  float_value
0   1            5            5
1   2           6M      6000000
2   3           7B   7000000000

[3 rows x 3 columns]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM