简体   繁体   English

在 pandas read_csv 中将百分比字符串转换为浮点数

[英]Convert percent string to float in pandas read_csv

Is there a way to convert values like '34%' directly to int or float when using read_csv in pandas?在 pandas 中使用 read_csv 时,有没有办法将像 '34%' 这样的值直接转换为 int 或 float? I want '34%' to be directly read as 0.34我希望 '34%' 直接读为 0.34

  1. Using this in read_csv did not work:read_csv中使用它不起作用:

    read_csv(..., dtype={'col':np.float})

  2. After loading the csv as 'df' this also did not work with the error "invalid literal for float(): 34%"将 csv 加载为“df”后,这也不适用于错误“float() 的无效文字:34%”

    df['col'] = df['col'].astype(float)

  3. I ended up using this which works but is long winded:我最终使用了这个有效但冗长的方法:

    df['col'] = df['col'].apply(lambda x: np.nan if x in ['-'] else x[:-1]).astype(float)/100

You can define a custom function to convert your percents to floats您可以定义一个自定义函数来将百分比转换为浮点数

In [149]:
# dummy data
temp1 = """index col 
113 34%
122 50%
123 32%
301 12%"""
# custom function taken from https://stackoverflow.com/questions/12432663/what-is-a-clean-way-to-convert-a-string-percent-to-a-float
def p2f(x):
    return float(x.strip('%'))/100
# pass to convertes param as a dict
df = pd.read_csv(io.StringIO(temp1), sep='\s+',index_col=[0], converters={'col':p2f})
df
Out[149]:
        col
index      
113    0.34
122    0.50
123    0.32
301    0.12
In [150]:
# check that dtypes really are floats
df.dtypes
Out[150]:
col    float64
dtype: object

My percent to float code is courtesy of ashwini's answer: What is a clean way to convert a string percent to a float?我的百分比浮动代码是由 ashwini 的回答提供的: 将字符串百分比转换为浮点数的干净方法是什么?

You were very close with your df attempt.你非常接近你的df尝试。 Try changing:尝试改变:

df['col'] = df['col'].astype(float)

to:至:

df['col'] = df['col'].str.rstrip('%').astype('float') / 100.0
#                     ^ use str funcs to elim '%'     ^ divide by 100
# could also be:     .str[:-1].astype(...

Pandas supports Python's string processing ability. Pandas 支持 Python 的字符串处理能力。 Just precede the string function you want with .str and see if it does what you need.只需在您想要的字符串函数之前加上.str ,看看它是否满足您的需求。 (This includes string slicing, too, of course.) (当然,这也包括字符串切片。)

Above we utilize .str.rstrip() to get rid of the trailing percent sign, then we divide the array in its entirety by 100.0 to convert from percentage to actual value.上面我们使用.str.rstrip()去除尾随百分号,然后我们将整个数组除以 100.0 以将百分比转换为实际值。 For example, 45% is equivalent to 0.45.例如,45% 相当于 0.45。

Although .str.rstrip('%') could also just be .str[:-1] , I prefer to explicitly remove the '%' rather than blindly removing the last char, just in case...虽然.str.rstrip('%')也可以只是.str[:-1] ,但我更喜欢明确删除 '%' 而不是盲目地删除最后一个字符,以防万一......

Q: How get pandas dataframe / series from percent?问:如何从百分比中获取熊猫数据框/系列?

A :

dfp = df[col].str.rstrip('%').astype(float) / 100

Explanation : Convert to string, strip last character if %.说明转换为字符串,如果 % 则去掉最后一个字符。 Convert to float and divide by 100.转换为浮点数并除以 100。

Variation of @Gary02127 @Gary02127 的变体

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM