简体   繁体   中英

python pandas read text file, skip particular lines

I am trying to read a text file using pd.read_csv

df = pd.read_csv('filename.txt', delimiter = "\t")

My text file (see below) has a few lines of text before the dataset I need to import begins. How do I skip the lines before the dataset headers? I don't want to use any solution that involves counting the number of lines I need to skip because I have to do this for multiple (similar, not same) text files. Any help is appreciated!

Note: I cannot upload the text file as it is confidential

========================================= 
hello 123
========================================= 
Dir: /x/y/z/RTchoice/release001/data 
Date: 17-Mar-2020 10:0:08 
Output File: /a/b/c/filename.txt 
N: 2842
-----------------------------------------
Subject col1    col2    col3    
001 10.00000    1.00000 3.00000 
002 11.00000    2.00000 4.00000

Here is an attempt to 'craft magic'. The idea is to try read_csv with different skiprows until it works

import pandas as pd
from io import StringIO
data = StringIO(
'''
========================================= 
hello 123
========================================= 
Dir: /x/y/z/RTchoice/release001/data 
Date: 17-Mar-2020 10:0:08 
Output File: /a/b/c/filename.txt 
N: 2842
-----------------------------------------
Subject col1    col2    col3    
001 10.00000    1.00000 3.00000 
002 11.00000    2.00000 4.00000
''')

for n in range(1000):
    try:
        data.seek(0)
        df = pd.read_csv(data, delimiter = "\s+", skiprows=n)
    except:
        print(f'skiprows = {n} failed (exception)')   
    else:
        if len(df.columns) == 1: # do not let it get away with a single-column df
            print(f'skiprows = {n} failed (single column)')
        else:   
            break
print('\n', df)

output:


skiprows = 0 failed (exception)
skiprows = 1 failed (exception)
skiprows = 2 failed (exception)
skiprows = 3 failed (exception)
skiprows = 4 failed (exception)
skiprows = 5 failed (exception)
skiprows = 6 failed (exception)
skiprows = 7 failed (exception)
skiprows = 8 failed (single column)

    Subject  col1  col2  col3
0        1  10.0   1.0   3.0
1        2  11.0   2.0   4.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM