[英]Converting string of numbers and letters to int/float in pandas dataframe
I feel like there has to be a quick solution to my problem, I hacked out a poorly implemented solution using multiple list comprehensions which is not ideal whatsoever. 我觉得必须快速解决我的问题,我使用多个列表推导法破解了一个实施不当的解决方案,这种方法并不理想。 Maybe someone could help out here.
也许有人可以在这里帮忙。
I have a set of values which are strings (eg 3.2B, 1.5M, 1.1T) where naturally the last character denotes million, billion, trillion. 我有一组值是字符串(例如3.2B,1.5M,1.1T),其中最后一个字符自然表示百万,十亿,万亿。 Within the set there are also NaN/'none' values which should remain untouched.
在集合中还有NaN /'none'值,这些值应保持不变。 I wish to convert these to floats or ints, so in the given example (3200000000, 1500000, 1100000000000)
我希望将它们转换为浮点数或整数,因此在给定的示例中(3200000000,1500000,1100000000000)
TIA TIA
You could create a function: and applymap
it to every entry in the dataframe: 您可以创建一个函数:并将其应用于数据
applymap
每个条目:
powers = {'B': 10 ** 9, 'M': 10 ** 6, 'T': 10 ** 12}
# add some more to powers as necessary
def f(s):
try:
power = s[-1]
return int(s[:-1]) * powers[power]
except TypeError:
return s
df.applymap(f)
Setup 设定
Borrowing @MaxU's pd.DataFrame
借用@ MaxU的
pd.DataFrame
df = pd.DataFrame({'col': ['123.456', '78M', '0.5B']})
Solution 解
Replace strings with scientific notation then use astype(float)
用科学记数法替换字符串然后使用
astype(float)
d = dict(M='E6', B='E9', T='E12')
df.replace(d, regex=True).astype(float)
col
0 1.234560e+02
1 7.800000e+07
2 5.000000e+08
Demo: 演示:
In [58]: df
Out[58]:
col
0 123.456
1 78M
2 0.5B
In [59]: d = {'B': 10**9, 'M': 10**6}
In [60]: df['new'] = \
...: df['col'].str.extract(r'(?P<val>[\d.]+)\s*?(?P<mult>\D*)', expand=True) \
...: .replace('','1') \
...: .replace(d, regex=True) \
...: .astype(float) \
...: .eval('val * mult')
...:
In [61]: df
Out[61]:
col new
0 123.456 1.234560e+02
1 78M 7.800000e+07
2 0.5B 5.000000e+08
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.