简体   繁体   中英

Pandas read_csv end reading at first linebreak

I am trying to read a csv file with some garbage at the top, but also garbage at the bottom of the interesting data. I need to read multiple files and the length of the interesting data varies. Is there a way to let the pd.read_csv command know that the dataframe ends at the first linebreak?

Example data (screenshot from excel): 示例数据

I read the file with: dataframe = pd.read_csv(file, skiprows=45) Which nicely gives me a dataframe with 10 columns with the headers on line 46 (see image). However, it continues further than the #GARBAGE DATA row.

Important note: Neither the length of the data nor the length of the footer is of equal length in the different files I want to read.

Two ways you could implement this

1) use skipfooter parameter of read csv, it tells the function the Number of lines at bottom of file to skip

pd.read_csv("in.csv",skiprows=45,skipfooter=2)

2) Read the file as it is and later use dropna function, this should drop the Garbage values.

df.dropna(inplace=True)

After using this command:

dataframe = pd.read_csv(file, skiprows=45)

You can use this command:

dataframe= dataframe.dropna(how='any')

This would delete a row if any empty value has been found in that row. Hence it would delete rest of all the rows.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM