[英]How to specify dtype when using pandas.read_csv to load data from csv files?
I have some text files with the following format: 我有一些文本文件格式如下:
000423|东阿阿胶| 300|1|0.15000| |
000425|徐工机械| 600|1|0.15000| |
000503|海虹控股| 400|1|0.15000| |
000522|白云山A| |2| | 1982.080|
000527|美的电器| 900|1|0.15000| |
000528|柳 工| 300|1|0.15000| |
when I use read_csv to load them into DataFrame, it doesn't generate correct dtype for some columns. 当我使用read_csv将它们加载到DataFrame时,它不会为某些列生成正确的dtype。 For example, the first column is parsed as int, not unicode str, the third column is parsed as unicode str, not int, because of one missing data... Is there a way to preset the dtype of the DataFrame, just like the numpy.genfromtxt does?
例如,第一列被解析为int,而不是unicode str,第三列被解析为unicode str,而不是int,因为缺少一个数据...有没有办法预设DataFrame的dtype,就像numpy.genfromtxt呢?
Updates: I used read_csv
like this which caused the problem: 更新:我使用
read_csv
这样会导致问题:
data = pandas.read_csv(StringIO(etf_info), sep='|', skiprows=14, index_col=0,
skip_footer=1, names=['ticker', 'name', 'vol', 'sign',
'ratio', 'cash', 'price'], encoding='gbk')
In order to solve both the dtype and encoding problems, I need to use unicode()
and numpy.genfromtxt
first: 为了解决dtype和编码问题,我需要首先使用
unicode()
和numpy.genfromtxt
:
etf_info = unicode(urllib2.urlopen(etf_url).read(), 'gbk')
nd_data = np.genfromtxt(StringIO(etf_info), delimiter='|',
skiprows=14, skip_footer=1, dtype=ETF_DTYPE)
data = pandas.DataFrame(nd_data, index=nd_data['ticker'],
columns=['name', 'vol', 'sign',
'ratio', 'cash', 'price'])
It would be nice if read_csv
can add dtype
and usecols
settings. 这将是很好,如果
read_csv
可以增加dtype
和usecols
设置。 Sorry for my greed. 抱歉,我的贪婪。 ^_^
^ _ ^
Simply put: no, not yet. 简单地说:不,还没有。 More work (read: more active developers) is needed on this particular area.
在这个特定领域需要做更多工作(阅读:更活跃的开发人员)。 If you could post how you're using
read_csv
it might help. 如果你可以发布你如何使用
read_csv
它可能会有所帮助。 I suspect that the whitespace between the bars may be the problem 我怀疑条之间的空白可能是问题所在
EDIT: this is now obsolete. 编辑:现在已经过时了。 This behavior is covered natively by read_csv
read_csv本身涵盖了此行为
You can now use dtype in read_csv . 您现在可以在read_csv中使用dtype 。
PS: Kudos to Wes McKinney for answering, it feels quite awkward to contradict the "past Wes". PS:感谢Wes McKinney的回答,与“过去的Wes”相矛盾感觉很尴尬。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.