简体   繁体   English

Python Pandas不读取csv文件的第一行

[英]Python Pandas does not read the first row of csv file

I have a problem with reading CSV(or txt file) on pandas module Because numpy's loadtxt function takes too much time, I decided to use pandas read_csv instead. 我在pandas模块上读取CSV(或txt文件)时遇到问题因为numpy的loadtxt函数需要花费太多时间,所以我决定使用pandas read_csv。

I want to make a numpy array from txt file with four columns separated by space, and has very large number of rows (like, 256^3. In this example, it is 64^3). 我想从txt文件中创建一个numpy数组,其中四列用空格分隔,并且行数非常多(例如,256 ^ 3。在本例中,它是64 ^ 3)。

The problem is that I don't know why but it seems that pandas's read_csv always skips the first line (first row) of the csv (txt) file, resulting one less data. 问题是我不知道为什么,但似乎pandas的read_csv总是跳过csv(txt)文件的第一行(第一行),从而减少一个数据。

here is the code. 这是代码。

from __future__ import division
import numpy as np
import pandas as pd
ngridx = 4
ngridy = 4
ngridz = 4
size = ngridx*ngridy*ngridz
f = np.zeros((size,4))
a = np.arange(size)
f[:, 0] = np.floor_divide(a, ngridy*ngridz)
f[:, 1] = np.fmod(np.floor_divide(a, ngridz), ngridy)
f[:, 2] = np.fmod(a, ngridz)
f[:, 3] = np.random.rand(size)
print f[0]
np.savetxt('Testarray.txt',f,fmt='%6.16f')
g = pd.read_csv('Testarray.txt',delimiter=' ').values
print g[0]
print len(g[:,3])

f[0] and g[0] that are displayed in the output have to match but it doesn't, indicating that pandas is skipping the first line of the Testarray.txt . 输出中显示的f [0]和g [0]必须匹配但不匹配,表示pandas正在跳过Testarray.txt的第一行。 Also, length of loaded file g is less than the length of the array f . 此外,加载文件g的长度小于数组f的长度。

I need help. 我需要帮助。

Thanks in advance. 提前致谢。

By default, pd.read_csv uses header=0 (when the names parameter is also not specified) which means the first (ie 0th-indexed) line is interpreted as column names. 默认情况下, pd.read_csv使用header=0 (当未指定names参数时),这意味着第一行(即第0个索引)行被解释为列名。

If your data has no header, then use 如果您的数据没有标题,请使用

pd.read_csv(..., header=None)

For example, 例如,

import io
import sys
import pandas as pd
if sys.version_info.major == 3:
    # Python3
    StringIO = io.StringIO 
else:
    # Python2
    StringIO = io.BytesIO

text = '''\
1 2 3
4 5 6
'''

print(pd.read_csv(StringIO(text), sep=' '))

Without header , the first line, 1 2 3 , sets the column names: 如果没有header ,第一行( 1 2 3 )会设置列名:

   1  2  3
0  4  5  6

With header=None , the first line is treated as data: 使用header=None ,第一行被视为数据:

print(pd.read_csv(StringIO(text), sep=' ', header=None))

prints 版画

   0  1  2
0  1  2  3
1  4  5  6

如果您的文件没有标题行,则需要通过在调用pd.read_csv()时使用header = None来告诉Pandas。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 强制Python Pandas DataFrame(read_csv()方法)避免/不考虑将我的csv / txt文件的第一行作为标题 - Force Python Pandas DataFrame( read_csv() method) to avoid/not consider first row of my csv/txt file as header 有什么命令可以读取.csv文件并跳过Python 2.4.3中的第一行 - What are the commands to read a .csv file and skip the first row in Python 2.4.3 在 Pandas Python 中读取 CSV 文件 - Read CSV file in Pandas Python Python 通过 Pandas 读取 CSV 文件 - Python Read CSV File By Pandas python pandas csv_read 将所有行放在该行的第一个单元格中 - python pandas csv_read put all row in first cell of the row Python pandas read_csv 由于 csv 文件中的双引号而无法正确读取行 - Python pandas read_csv unable to read row properly because of double quotes in csv file 如何在 pandas(或 python csv)中读取此 csv 文件? - How to read this csv file in pandas (or python csv)? Python / Pandas:如何在cp1252中读取具有第一行要删除的csv? - Python/Pandas : how to read a csv in cp1252 with a first row to delete? 如何将 csv 文件读入熊猫,跳过行直到某个字符串,然后选择第一行作为标题和分隔符作为 | - How to read csv file into pandas, skipping rows until a certain string, then selecting first row after as header and delimiter as | 使用 Pandas、python 迭代行 csv 文件 - Iterate row csv file using pandas, python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM