[英]pandas: extract or split char from number string
I have a dataframe selected from a sql table that looks like this 我有一个从sql表中选择的数据帧,看起来像这样
id shares_float
0 1 621.76M
1 2 329.51M
in other word, 换句话说,
[(1, '621.76M'), (2, '329.51M')]
I want to split the shares_float so that if it is 'B', multiply 1,000,000,000 and if it is 'M', multiply 1,000,000 and if it is neither or don't have trailing character just convert and assign the number. 我想拆分shares_float,如果它是'B',则乘以1,000,000,000,如果它是'M',则乘以1,000,000,如果它既不是或者没有尾随字符,只需转换并分配数字。
Outcome should be a float type 结果应该是浮动类型
ticker_id shares_float float_value
0 1 621.76M 621760000.00
1 2 3.51B 3510000000.00
I am new to pandas. 我是熊猫新手。 Is there a way to do it in pandas?
有没有办法在熊猫中做到这一点? or should I convert data to list and do my manipulation in a loop and then convert it back to pandas DataFrame?
或者我应该将数据转换为列表并在循环中进行操作然后将其转换回pandas DataFrame?
Note added: The answer works great! 注意补充:答案很有效! Thank you.
谢谢。 BTW, how does the function work?
顺便说一下,这个功能如何运作?
Could use a conversion dictionary, also I am sure you didn't mean 624540000
: 可以使用转换字典,我也相信你并不是指
624540000
:
In [9]:
D={'M':'*1e6', 'B':'*1e9'}
df['float_value']=df.shares_float.apply(lambda x: eval(x[:-1]+D[x[-1]]))
In [10]:
print df
ticker_id shares_float float_value
0 1 621.76M 621760000
1 2 3.51B 3510000000
[2 rows x 3 columns]
In [11]:
df.dtypes
Out[11]:
ticker_id int64
shares_float object
float_value float64
dtype: object
you can use string methods to extract the pattern; 你可以使用字符串方法来提取模式; for example, in order to cover all cases, starting with:
例如,为了涵盖所有情况,从以下开始:
>>> df
id shares_float
0 1 5
1 2 6M
2 3 7B
[3 rows x 2 columns]
the numeric value and unit can be extracted by: 数值和单位可以通过以下方式提取:
>>> sh = df.shares_float.str.extract(r'(?P<val>[0-9.]*)(?P<unit>[MB]{0,1})')
>>> sh
val unit
0 5
1 6 M
2 7 B
[3 rows x 2 columns]
and then: 然后:
>>> unit_map = {'':1, 'M':1e6, 'B':1e9}
>>> df['float_value'] = sh.val.astype(np.float64) * sh.unit.map(unit_map)
>>> df
id shares_float float_value
0 1 5 5
1 2 6M 6000000
2 3 7B 7000000000
[3 rows x 3 columns]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.