[英]How to read csv formatted numeric data into Pandas
I have a csv file with two formatted columns that currently read in as objects: 我有一个csv文件,其中包含两个格式化的列,当前作为对象读入:
contains percentage values which read in as strings like '0.01%'. 包含以“0.01%”字符串形式读入的百分比值。 The % is always at the end.
%总是在最后。
contains currency values which read in as string like '$1234.5'. 包含以“$ 1234.5”字符串形式读入的货币值。
I have tried using the split function to remove the % or $ inside the dataframe, then using float on the result of the split. 我已经尝试使用split函数删除数据框内的%或$,然后在split的结果上使用float。 This will print the correct result but will not assign the value.
这将打印正确的结果,但不会分配值。 It also gives a type error that float does not have split function, even though I do the split before the float????
它还给出了一个类型错误,浮动没有拆分函数,即使我在浮点数之前进行拆分????
Try this: 尝试这个:
import pandas as pd
df = pd.read_csv('data.csv')
"""
The example df looks like this:
col1 col2
0 3.04% $100.25
1 0.15% $1250
2 0.22% $322
3 1.30% $956
4 0.49% $621
"""
df['col1'] = df['col1'].str.split('%', expand=True)[[0]]
df['col2'] = df['col2'].str.split('$', 1, expand=True)[[1]]
df[['col1', 'col2']] = df[['col1', 'col2']].apply(pd.to_numeric)
You are probably looking for the apply method. 您可能正在寻找apply方法。
With 同
df['first_col'] = df['first_col'].apply(lambda x: float(x.strip('%'))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.