简体   繁体   English

Python Pandas read_csv跳过行但保留标题

[英]Python Pandas read_csv skip rows but keep header

I'm having trouble figuring out how to skip n rows in a csv file but keep the header which is the 1 row. 我在弄清楚如何跳过csv文件中的n行但保留标题为1行时遇到了麻烦。

What I want to do is iterate but keep the header from the first row. 我想做的是迭代但保留第一行的标题。 skiprows makes the header the first row after the skipped rows. skiprows将标题设置为跳过的行之后的第一行。 What is the best way of doing this? 最好的方法是什么?

data = pd.read_csv('test.csv', sep='|', header=0, skiprows=10, nrows=10)

You can pass a list of row numbers to skiprows instead of an integer. 您可以通过行号到列表skiprows而不是整数。

By giving the function the integer 10, you're just skipping the first 10 lines. 通过给该函数提供整数10,您只需跳过前10行。

To keep the first row 0 (as the header) and then skip everything else up to row 10, you can write: 要保留第一行0(作为标题),然后将其他所有内容都跳过到第10行,可以编写:

pd.read_csv('test.csv', sep='|', skiprows=range(1, 10))

Other ways to skip rows using read_csv 使用read_csv跳过行的其他方法

The two main ways to control which rows read_csv uses are the header or skiprows parameters. 控制read_csv使用哪些行的两种主要方法是headerskiprows参数。

Supose we have the following CSV file with one column: 假设我们有以下带有一栏的CSV文件:

a
b
c
d
e
f

In each of the examples below, this file is f = io.StringIO("\\n".join("abcdef")) . 在下面的每个示例中,此文件为f = io.StringIO("\\n".join("abcdef"))

  • Read all lines as values (no header, defaults to integers) 读取所有行作为值(无标题,默认为整数)

     >>> pd.read_csv(f, header=None) 0 0 a 1 b 2 c 3 d 4 e 5 f 
  • Use a particular row as the header (skip all lines before that): 使用特定的行作为标题(跳过之前的所有行):

     >>> pd.read_csv(f, header=3) d 0 e 1 f 
  • Use a multiple rows as the header creating a MultiIndex (skip all lines before the last specified header line): 使用多行作为创建MultiIndex的标题(跳过最后指定的标题行之前的所有行):

     >>> pd.read_csv(f, header=[2, 4]) c e 0 f 
  • Skip N rows from the start of the file (the first row that's not skipped is the header): 从文件的开头跳过N行(未跳过的第一行是标题):

     >>> pd.read_csv(f, skiprows=3) d 0 e 1 f 
  • Skip one or more rows by giving the row indices (the first row that's not skipped is the header): 通过指定行索引来跳过一个或多个行(未跳过的第一行是标题):

     >>> pd.read_csv(f, skiprows=[2, 4]) a 0 b 1 d 2 f 

Great answers already.. I somehow feel the need to add the generalized form here.. Consider this scenario:- 已经有了不错的答案。.我某种程度上觉得有必要在这里添加广义形式。.考虑这种情况:-

Say your xls/csv has junk rows in the top 2 rows (row #0,1). 假设您的xls / csv的前2行(行#0,1)中有垃圾行。 Row #2 (3rd row)is the real header and you want to load 10 rows starting from row#50 (ie 51st row).. Here's the snippet:- 第2行(第3行)是真正的标题,您要从第50行(即第51行)开始加载10行。这是代码段:-

pd.read_csv('test.csv', header=2, skiprows=range(3, 50), nrows=10)

To expand on @AlexRiley's answer, the skiprows argument takes a list of numbers which determines what rows to skip. 为了扩展@AlexRiley的答案, skiprows参数采用数字列表,该数字列表确定要跳过的行。 So: 所以:

pd.read_csv('test.csv', sep='|', skiprows=range(1, 10))

is the same as: 是相同的:

pd.read_csv('test.csv', sep='|', skiprows=[1,2,3,4,5,6,7,8,9])

The best way to go about ignoring specific rows would be to create your ignore list (either manually or with a function like range that returns a list of integers) and pass it to skiprows . 忽略特定行的最好方法是创建忽略列表(手动创建或使用诸如range的函数返回整数列表)并将其传递给skiprows

If you're iterating through a long csv file, you can use the chunksize argument. 如果要遍历一个长的csv文件,则可以使用chunksize参数。 If for some reason you need to manually step through it, you can try the following as long as you know how many iterations you need to go through: 如果出于某种原因需要手动执行此操作,则可以尝试以下操作,只要知道需要执行多少次迭代即可:

for i in range(num_iters):
    pd.read_csv('test.csv', sep='|', header=0, 
                 skiprows = range(i*10 + 1, (i+1)*10), nrows=10)

If you need to skip/drop specific rows, say the first 3 rows (ie 0,1,2) and then 2 more rows (ie 4,5). 如果您需要跳过/删除特定的行,请说前3行(即0,1,2),然后再说2行(即4,5)。 You can use the following to retain the header row: 您可以使用以下内容保留标题行:

df = pd.read_csv(file_in, delimiter='\t', skiprows=[0,1,2,4,5], encoding='utf-16', usecols=cols)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM