Pandas read_csv end reading at first linebreak

Question

I am trying to read a csv file with some garbage at the top, but also garbage at the bottom of the interesting data. I need to read multiple files and the length of the interesting data varies. Is there a way to let the pd.read_csv command know that the dataframe ends at the first linebreak?

Example data (screenshot from excel):

I read the file with: dataframe = pd.read_csv(file, skiprows=45) Which nicely gives me a dataframe with 10 columns with the headers on line 46 (see image). However, it continues further than the #GARBAGE DATA row.

Important note: Neither the length of the data nor the length of the footer is of equal length in the different files I want to read.

Answer 1

Two ways you could implement this

1) use skipfooter parameter of read csv, it tells the function the Number of lines at bottom of file to skip

pd.read_csv("in.csv",skiprows=45,skipfooter=2)

2) Read the file as it is and later use dropna function, this should drop the Garbage values.

df.dropna(inplace=True)

Answer 2

After using this command:

dataframe = pd.read_csv(file, skiprows=45)

You can use this command:

dataframe= dataframe.dropna(how='any')

This would delete a row if any empty value has been found in that row. Hence it would delete rest of all the rows.

Pandas read_csv end reading at first linebreak

Question

2 answers

solution1
1 2019-11-20 09:23:35

solution2
0 2019-11-20 09:26:29

Pandas read_csv end reading at first linebreak

Question

2 answers

solution1 1 2019-11-20 09:23:35

solution2 0 2019-11-20 09:26:29

solution1
1 2019-11-20 09:23:35

solution2
0 2019-11-20 09:26:29