简体   繁体   English

read_table pandas python数字错误

[英]read_table pandas python numeric error

I am doing a basic pd.read_table of a .txt file. 我正在做一个.txt文件的基本pd.read_table The first column is a list of cusips. 第一列是cusips列表。 The cusip "65248E10" is being read as a number 65248E10 = 652480000000000 (E10 as scientific notation). cusip "65248E10"的读数为65248E10 = 652480000000000 (E10为科学记数法)。

I have been going through the pandas but I can't figure out how to require it to stay as a character. 我一直在经历大熊猫,但我无法弄清楚如何要求它作为一个角色。 http://pandas.pydata.org/pandas-docs/dev/generated/pandas.io.parsers.read_table.html#pandas.io.parsers.read_table http://pandas.pydata.org/pandas-docs/dev/generated/pandas.io.parsers.read_table.html#pandas.io.parsers.read_table

Also, even if I put header = 0, it seems to be putting the first row as the headers and then row 0 is the second row and so on. 此外,即使我把header = 0,它似乎将第一行作为标题,然后第0行是第二行,依此类推。 If my text file has no column names, how can I get that to default to NULL (or 1, 2, 3, etc.) 如果我的文本文件没有列名,我怎么能将它默认为NULL(或1,2,3等)

Thanks for the help. 谢谢您的帮助。 I am new to pandas/python 我是pandas / python的新手

If we have a data file which looks like 如果我们有一个看起来像的数据文件

65248E10 11
55555E55 22

then we can read it in with something like 然后我们可以用类似的东西读它

>>> pd.read_table("cusip.txt", header=None, delimiter=" ", converters={0: str})
          0   1
0  65248E10  11
1  55555E55  22

where we use header=None to tell it that there aren't any headers, we use delimiter=" " to tell it there's a space delimiter (adjust to match your data format), and converters={0: str} to tell it that after reading the first column in as a string, we want to turn it into a string (ie in this case do nothing to it) rather than process it further. 我们使用header=None来告诉它没有任何头文件,我们使用delimiter=" "告诉它有一个空格分隔符(调整以匹配你的数据格式),并且converters={0: str}告诉它在以字符串形式读取第一列之后,我们希望将其转换为字符串(即在这种情况下不执行任何操作),而不是进一步处理它。 Instead of converters={0: str} , dtype=(str, int) would have worked too, but this way we can still let pandas figure out what the other columns are. 而不是converters={0: str}dtype=(str, int)也会起作用,但这样我们仍然可以让pandas弄清楚其他列是什么。

The problem with using header=0 is that 0 here doesn't mean "no header", it means use row number #0 (the first row) as the headers. 使用header=0的问题是0这里并不意味着“没有标题”,它意味着使用行号#0(第一行)作为标题。

To stop your column from being read as a number, use the converters parameter and specify str as the converter for the column containing your "cusips". 要阻止将列读取为数字,请使用converters参数并将str指定为包含“cusips”的列的转换器。

For the header, as documented on the page you linked to, header is the number of the row which is to be considered the header; 对于标题,如您链接到的页面上所记录的那样, header是要被视为标题的行 ; it is not a boolean saying "do I have a header or not. Setting it to zero means to use row zero (ie, the first row) as the header. The documentation explicitly says: 它不是一个布尔说法“我是否有标题。将其设置为零意味着使用行零(即第一行)作为标题。文档明确说明:

Specify None if there is no header row. 如果没有标题行,请指定None。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM