简体   繁体   中英

pandas dataframe index remove date from datetime

File dataexample_df.txt:

2020-12-04_163024 26.15 26.37 19.40 24.57
2020-12-04_163026 26.15 26.37 19.20 24.57
2020-12-04_163028 26.05 26.37 18.78 24.57

I want to read it in as pandas dataframe where the index column has only the time part in format '%H:%M:%S' , without the date.

import pandas as pd
df = pd.read_csv("dataexample_df.txt", sep=' ', header=None, index_col=0)
print(df)

Output:

                       1      2      3      4
0
2020-12-04_163024  26.15  26.37  19.40  24.57
2020-12-04_163026  26.15  26.37  19.20  24.57
2020-12-04_163028  26.05  26.37  18.78  24.57

However, wanted output:

              1      2      3      4
0
16:30:24  26.15  26.37  19.40  24.57
16:30:26  26.15  26.37  19.20  24.57
16:30:28  26.05  26.37  18.78  24.57

I have tried different date_parser= -functions (cf. Answers in Parse_dates in Pandas ) but get only error messages. Also, somewhat relevant is Python/Pandas convert string to time only but no luck, I'm stuck. I'm using Python 3.7.

Considering your df to be this:

In [121]: df
Out[121]: 
                       1      2      3      4
0                                            
2020-12-04_163024  26.15  26.37  19.40  24.57
2020-12-04_163026  26.15  26.37  19.20  24.57
2020-12-04_163028  26.05  26.37  18.78  24.57

You can use Series.replace with Series.dt.time :

In [122]: df.reset_index(inplace=True)
In [127]: df[0] = pd.to_datetime(df[0].str.replace('_', ' ')).dt.time

In [130]: df.set_index(0, inplace=True)

In [131]: df
Out[131]: 
              1      2      3      4
0                                   
16:30:24  26.15  26.37  19.40  24.57
16:30:26  26.15  26.37  19.20  24.57
16:30:28  26.05  26.37  18.78  24.57

Here, I created a simple function to formate your datetime column, Please try this.

import pandas as pd

df = pd.read_csv('data.txt', sep=" ", header=None)

def format_time(date_str):
    # split date and time
    time =  iter(date_str.split('_')[1])
    # retun the time value adding
    return ':'.join(a+b for a,b in zip(time, time))

df[0] = df[0].apply(format_time)

print(df)

You need to tell it what the format of your date is using the format argument (otherwise you'll get an error):

# gives an error:
pd.to_datetime('2020-12-04_163024')

# works:
pd.to_datetime('2020-12-04_163024', format=r'%Y-%m-%d_%H%M%S')

So you can apply this to your dataframe and then access the time by using dt.time :

df['time'] = pd.to_datetime(df.index, format=r'%Y-%m-%d_%H%M%S').dt.time

That will give you the time as an object, but if you want to format it, just use something like this:

df['time'] = df['time'].strftime('%H:%M:%S')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM