Skip rows during csv import pandas

Question

I'm trying to import a.csv file using pandas.read_csv() , however, I don't want to import the 2nd row of the data file (the row with index = 1 for 0-indexing).

I can't see how not to import it because the arguments used with the command seem ambiguous:

From the pandas website:

skiprows : list-like or integer

Row numbers to skip (0-indexed) or number of rows to skip (int) at the start of the file."

If I put skiprows=1 in the arguments, how does it know whether to skip the first row or skip the row with index 1?

Answer 1

You can try yourself:

>>> import pandas as pd
>>> from StringIO import StringIO
>>> s = """1, 2
... 3, 4
... 5, 6"""
>>> pd.read_csv(StringIO(s), skiprows=[1], header=None)
   0  1
0  1  2
1  5  6
>>> pd.read_csv(StringIO(s), skiprows=1, header=None)
   0  1
0  3  4
1  5  6

Answer 2

I don't have reputation to comment yet, but I want to add to alko answer for further reference.

From the docs :

skiprows: A collection of numbers for rows in the file to skip. Can also be an integer to skip the first n rows

Answer 3

I got the same issue while running the skiprows while reading the csv file. I was doning skip_rows=1 this will not work

Simple example gives an idea how to use skiprows while reading csv file.

import pandas as pd

#skiprows=1 will skip first line and try to read from second line
df = pd.read_csv('my_csv_file.csv', skiprows=1)  ## pandas as pd

#print the data frame
df

Answer 4

All of these answers miss one important point -- the n'th line is the n'th line in the file, and not the n'th row in the dataset. I have a situation where I download some antiquated stream gauge data from the USGS. The head of the dataset is commented with '#', the first line after that are the labels, next comes a line that describes the date types, and last the data itself. I never know how many comment lines there are, but I know what the first couple of rows are. Example:

----------------------------- WARNING ----------------------------------

Some of the data that you have obtained from this US Geological Survey database

may not have received Director's approval. ... agency_cd site_no datetime tz_cd 139719_00065 139719_00065_cd

5s 15s 20d 6s 14n 10s USGS 08041780 2018-05-06 00:00 CDT 1.98 A

It would be nice if there was a way to automatically skip the n'th row as well as the n'th line.

As a note, I was able to fix my issue with:

import pandas as pd
ds = pd.read_csv(fname, comment='#', sep='\t', header=0, parse_dates=True)
ds.drop(0, inplace=True)

Answer 5

You have the following options to skip rows in Pandas:

from io import StringIO

csv = \
"""col1,col2
1,a
2,b
3,c
4,d
"""
pd.read_csv(StringIO(csv))

# Output:
   col1 col2  # index 0
0     1    a  # index 1
1     2    b  # index 2
2     3    c  # index 3
3     4    d  # index 4

Skip two lines at the start of the file (index 0 and 1). Column names are skipped as well (index 0) and the top line is used for column names. To add column names use names = ['col1', 'col2'] parameter:

pd.read_csv(StringIO(csv), skiprows=2)

# Output:
   2  b
0  3  c
1  4  d

Skip second and fourth lines (index 1 and 3):

pd.read_csv(StringIO(csv), skiprows=[1, 3])

# Output:
   col1 col2
0     2    b
1     4    d

Skip last two lines:

pd.read_csv(StringIO(csv), engine='python', skipfooter=2)

# Output:
   col1 col2
0     1    a
1     2    b

Use a lambda function to skip every second line (index 1 and 3):

pd.read_csv(StringIO(csv), skiprows=lambda x: (x % 2) != 0)

# Output:
   col1 col2
0     2    b
1     4    d

Answer 6

skip[1]将跳过第二行，而不是第一行。

Answer 7

Also be sure that your file is actually a CSV file. For example, if you had an .xls file, and simply changed the file extension to .csv, the file won't import and will give the error above. To check to see if this is your problem open the file in excel and it will likely say:

"The file format and extension of 'Filename.csv' don't match. The file could be corrupted or unsafe. Unless you trust its source, don't open it. Do you want to open it anyway?"

To fix the file: open the file in Excel, click "Save As", Choose the file format to save as (use .cvs), then replace the existing file.

This was my problem, and fixed the error for me.

Skip rows during csv import pandas

Question

6 answers

solution1
170 ACCPTED 2013-12-17 15:04:27

solution2
31 2014-05-19 13:35:52

solution3
21 2019-03-26 18:11:40

solution4
2 2020-05-12 23:09:06

----------------------------- WARNING ----------------------------------

Some of the data that you have obtained from this US Geological Survey database

may not have received Director's approval. ... agency_cd site_no datetime tz_cd 139719_00065 139719_00065_cd

solution5
0 2021-12-09 08:43:49

solution6
-1 2019-05-02 01:40:32

solution7
-7 2016-06-22 16:01:39

Skip rows during csv import pandas

Question

6 answers

solution1 170 ACCPTED 2013-12-17 15:04:27

solution2 31 2014-05-19 13:35:52

solution3 21 2019-03-26 18:11:40

solution4 2 2020-05-12 23:09:06

----------------------------- WARNING ----------------------------------

Some of the data that you have obtained from this US Geological Survey database

may not have received Director's approval. ... agency_cd site_no datetime tz_cd 139719_00065 139719_00065_cd

solution5 0 2021-12-09 08:43:49

solution6 -1 2019-05-02 01:40:32

solution7 -7 2016-06-22 16:01:39

solution1
170 ACCPTED 2013-12-17 15:04:27

solution2
31 2014-05-19 13:35:52

solution3
21 2019-03-26 18:11:40

solution4
2 2020-05-12 23:09:06

solution5
0 2021-12-09 08:43:49

solution6
-1 2019-05-02 01:40:32

solution7
-7 2016-06-22 16:01:39