I have a csv file like below:
SUMMARY OF SURFACE ENERGY BALANCE
INCOMING NET SOLAR RADIATION BY MATERIAL NET LONG-WAVE RADIATION BY MATERIAL
SOLAR REFLECTED ------------------------------------------ INCOMING OUTGOING -----------------------------------------
DAY HR YR ON SLOPE SOLAR CANOPY SNOW RESIDUE SOIL TOTAL LONGWAVE LONGWAVE CANOPY SNOW RESIDUE SOIL TOTAL SENSIBLE LATENT SOIL
W/M2 W/M2 W/M2 W/M2 W/M2 W/M2 W/M2 W/M2 W/M2 W/M2 W/M2 W/M2 W/M2 W/M2 W/M2 W/M2 W/M2
338 24 86 30.8 5.6 19.4 0.0 5.4 0.5 25.3 290.6 317.5 -16.4 0.0 -6.3 -4.1 -26.9 -4.7 -0.8 -6.8
339 24 86 11.6 5.6 4.8 1.2 0.0 0.0 6.0 301.5 311.4 -5.2 -3.5 -0.4 -0.7 -9.9 1.3 -0.1 -7.1
...
The file's 1st, 3rd, 4th, 10th, 11th, and 12th line are empty.
Line 7 is the header.
The line after line 13 is data.
I want to read it into a dataframe and do some analysis.
To achieve this I must:
If I use this code can get the correct result:
import pandas as pd
df = pd.read_csv(path, header=3, skiprows=[7])
print(df.head())
Which will print like this:
DAY HR YR ON SLOPE SOLAR CANOPY SNOW RESIDUE SOIL TOTAL LONGWAVE LONGWAVE CANOPY SNOW RESIDUE SOIL TOTAL SENSIBLE LATENT SOIL
0 338 24 86 30.8 5.6 19.4 0...
1 339 24 86 11.6 5.6 4.8 1...
2 340 24 86 22.2 18.5 0.0 3...
3 341 24 86 22.8 18.7 0.0 4...
4 342 24 86 48.4 37.0 4.4 7...
However, when I called the read_csv function, set the header parameter to 3, and set the skiprows parameter to 7 I get this result (even though I need skiprow to just apply after the header row).
The header has ignored the empty lines before header, but the skiprows can't ignore empty lines before which will be skipped.
Conclusion
So I want to know can skiprows parameter ignore the empty lines?
If possible, I just need know the number of skiprows after header row number, and ignoring the need to count it from top.
I took a quick look at the documentation and it seems like not, the reason is because header
ignores lines when the parameter skip_blank_lines
is set to True
(by default), but skiprows
does not take into account that parameter.
You could, however, just read without the skiprows
parameter and drop the na
values.
df = pd.read_csv(path, header=3, skip_blank_lines=True).dropna()
But to be honest this might not be a good idea, because dtypes will be set to objects
for the affected columns with na
values.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.