简体   繁体   中英

Reading text file with variable columns in pandas dataframe

I have a text file like this:

MAX_POWER   SPEED   ETDWPNO ETAWPNO OPTIMIZED   BUDGET
100 20.0    000 000 MaxSpeed    00000000.00
ETD_YEAR    ETD_MONTH   ETD_DAY ETD_HOUR    ETD_MINUTE  ETA_YEAR    ETA_MONTH   ETA_DAY         ETA_HOUR    ETA_MINUTE
2013    03  03  08  00  2013    03  03  08  00
NAME    LAT LON LEG_TYPE    TURN_RADIUS CHN_LIMIT   PLANNED_SPEED   SPEED_MIN           SPEED_MAX   COURSE  LENGTH      DO_PLAN HFO_PLAN    HFO_LEFT    DO_LEFT ETA_DAY ETA_TIME
BERTH   34 28.343 N 133 27.147 E    RHUMBLINE   00.8    00185   000.0   000.0   000.0   000.0   00000.00    00000.0 00000.0 00000   00000   0000.00.00  00:00
CHANNEL 34 28.005 N 133 26.887 E    RHUMBLINE   00.3    00110   006.0   000.0   012.5   212.5   00000.32    00000.0 00000.0 00000   00000   0000.00.00  00:00
FAIRWAY     34 22.671 N 133 26.773 E    RHUMBLINE   00.3    00100   008.0   000.0   012.5   181.0   00005.35    00000.0 00000.0 00000   00000   0000.00.00  00:00
HAKAMA S    34 21.016 N 133 27.444 E    RHUMBLINE   00.3    00231   011.3   000.0   012.5   161.4   00001.74    00000.0 00000.0 00000   00000   0000.00.00  00:00
MU SHIMA    34 17.485 N 133 30.836 E    RHUMBLINE   00.3    00231   011.3   000.0   012.5   141.4   00004.41    00000.0 00000.0 00000   00000   0000.00.00  00:00
BISAN SE    34 17.571 N 133 37.128 E    RHUMBLINE   00.3    00233   011.3   000.0   012.5   089.1   00005.34    00000.0 00000.0 00000   00000   0000.00.00  00:00
BISAN SE    34 17.557 N 133 40.198 E    RHUMBLINE   00.3    00231   011.3   000.0   012.5   090.3   00002.45    00000.0 00000.0 00000   00000   0000.00.00  00:00
BISAN SE    34 18.594 N 133 42.000 E    RHUMBLINE   00.3    00231   011.3   000.0   012.5   055.3   00001.89    00000.0 00000.0 00000   00000   0000.00.00  00:00
BISAN SE    34 20.873 N 133 47.007 E    RHUMBLINE   00.3    00231   011.3   000.0   012.5   061.2   00004.74    00000.0 00000.0 00000   00000   0000.00.00  00:00

while reading this file:

data = read_csv("D:/waypoints/route/"+file[0],sep="\t", header=None, engine='python')

I got this error:

ParserError: Expected 12 fields in line 5, saw 20

i tried skipping first 4 rows and that worked but i don't want to go by this approach.

i don't want to skip any rows.

Can this all be used to create a dataframe or multiple dataframes based on the no. of columns?

Can anybody help me with this?

Any help would be appreciated.

here is a beginning of solution:

df = pd.read_csv("file.csv", sep="\t", header=None, engine='python', names=['col' + str(x) for x in range(30) ])

you have to use the option names with the number needed or greater than minimal or you'll have an error. i have choosen 30 columns from cols0 to cols29.. but to avoid error you could choose 100 or more

All columns full filled with NaN could be deleted after or you added the function at the end of first command:

df = df.dropna(axis=1, how='all')

its the only solution i see to read text file with variable columns in pandas dataframe

after that you could work on your dataframe and search the row you want

result:

         col0       col1     col2      col3  ... col17 col18       col19  col20
0   MAX_POWER      SPEED  ETDWPNO   ETAWPNO  ...   NaN   NaN        None   None
1         100       20.0      000       000  ...   NaN   NaN        None   None
2    ETD_YEAR  ETD_MONTH  ETD_DAY  ETD_HOUR  ...   NaN   NaN        None   None
3        2013         03       03        08  ...   NaN   NaN        None   None
4        NAME        LAT      LON  LEG_TYPE  ...   NaN   NaN        None   None
5       BERTH         34   28.343         N  ...   0.0   0.0  0000.00.00  00:00
6     CHANNEL         34   28.005         N  ...   0.0   0.0  0000.00.00  00:00
7     FAIRWAY         34   22.671         N  ...   0.0   0.0  0000.00.00  00:00
8    HAKAMA S         34   21.016         N  ...   0.0   0.0  0000.00.00  00:00
9    MU SHIMA         34   17.485         N  ...   0.0   0.0  0000.00.00  00:00
10   BISAN SE         34   17.571         N  ...   0.0   0.0  0000.00.00  00:00
11   BISAN SE         34   17.557         N  ...   0.0   0.0  0000.00.00  00:00
12   BISAN SE         34   18.594         N  ...   0.0   0.0  0000.00.00  00:00
13   BISAN SE         34   20.873         N  ...   0.0   0.0  0000.00.00  00:00

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM