简体   繁体   中英

Dose pd.read_csv skiprows parameter support skip empty lines?

I have a csv file like below:


                                                               SUMMARY OF SURFACE ENERGY BALANCE


              INCOMING                NET SOLAR RADIATION BY MATERIAL                               NET LONG-WAVE RADIATION BY MATERIAL
               SOLAR   REFLECTED ------------------------------------------  INCOMING OUTGOING   -----------------------------------------
 DAY HR  YR   ON SLOPE   SOLAR   CANOPY     SNOW   RESIDUE    SOIL    TOTAL  LONGWAVE LONGWAVE   CANOPY    SNOW   RESIDUE    SOIL    TOTAL  SENSIBLE  LATENT    SOIL
                 W/M2     W/M2     W/M2     W/M2     W/M2     W/M2     W/M2     W/M2     W/M2     W/M2     W/M2     W/M2     W/M2     W/M2     W/M2     W/M2     W/M2




 338 24   86     30.8      5.6     19.4      0.0      5.4      0.5     25.3    290.6    317.5    -16.4      0.0     -6.3     -4.1    -26.9     -4.7     -0.8     -6.8
 339 24   86     11.6      5.6      4.8      1.2      0.0      0.0      6.0    301.5    311.4     -5.2     -3.5     -0.4     -0.7     -9.9      1.3     -0.1     -7.1

...

The file's 1st, 3rd, 4th, 10th, 11th, and 12th line are empty.

Line 7 is the header.

The line after line 13 is data.

I want to read it into a dataframe and do some analysis.

To achieve this I must:

  • set the 7th line as the header
  • skip the 8th line (which is not data line)

If I use this code can get the correct result:

import pandas as pd
df = pd.read_csv(path, header=3, skiprows=[7])
print(df.head())

Which will print like this:

   DAY HR  YR   ON SLOPE   SOLAR   CANOPY     SNOW   RESIDUE    SOIL    TOTAL  LONGWAVE LONGWAVE   CANOPY    SNOW   RESIDUE    SOIL    TOTAL  SENSIBLE  LATENT    SOIL
0   338 24   86     30.8      5.6     19.4      0...                                                                                                                  
1   339 24   86     11.6      5.6      4.8      1...                                                                                                                  
2   340 24   86     22.2     18.5      0.0      3...                                                                                                                  
3   341 24   86     22.8     18.7      0.0      4...                                                                                                                  
4   342 24   86     48.4     37.0      4.4      7...   

However, when I called the read_csv function, set the header parameter to 3, and set the skiprows parameter to 7 I get this result (even though I need skiprow to just apply after the header row).

The header has ignored the empty lines before header, but the skiprows can't ignore empty lines before which will be skipped.

Conclusion

So I want to know can skiprows parameter ignore the empty lines?

If possible, I just need know the number of skiprows after header row number, and ignoring the need to count it from top.

I took a quick look at the documentation and it seems like not, the reason is because header ignores lines when the parameter skip_blank_lines is set to True (by default), but skiprows does not take into account that parameter.

You could, however, just read without the skiprows parameter and drop the na values.

df = pd.read_csv(path, header=3, skip_blank_lines=True).dropna()

But to be honest this might not be a good idea, because dtypes will be set to objects for the affected columns with na values.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM