简体   繁体   中英

How to read .log file in python?

I got this .log file. I don't know how to read them as DataFrame

 id  |        create_date         
-----+----------------------------
 318 | 2017-05-05 07:03:27.556697
 456 | 2017-07-03 01:50:07.966652
 249 | 2017-05-03 13:57:32.567373
pd.read_table("data.csv", sep="|", skiprows=[1], header=0, parse_dates=[1]).rename(columns=lambda x: x.strip())

    id                create_date
0  318 2017-05-05 07:03:27.556697
1  456 2017-07-03 01:50:07.966652
2  249 2017-05-03 13:57:32.567373

Parameters

  • sep="|"

    Use | as column separator

  • skiprows=[1]

    Ignore the second row, which is just decorations and would be the most problematic to parse

  • header=0

    Read column names from the first row

  • parse_dates=[1]

    Convert create_date column into pandas datetime64 format (may be optional)

  • rename(columns=lambda x: x.strip())

    Remove extra whitespaces from column names

You may want to add index_col=0 if you want to make id column your index instead of using a sequential one.

try this,

df=pd.read_csv('file_.csv',sep='|')

then you can remove -----+---------------------------- in many ways

  1. df[df[' id ']!='-----+----------------------------']
  2. df[~df[' id '].str.startswith('-')]
  3. df.drop(0) # it won't work if your file contains -----+---------------------------- in any other places for example footer
  4. df[df[' create_date '].notnull()] # it won't work when your create_date column contains NaN by default.

Output:

    id           create_date         
1   318    2017-05-05 07:03:27.556697
2   456    2017-07-03 01:50:07.966652
3   249    2017-05-03 13:57:32.567373

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM