简体   繁体   中英

how to set datetime type index for weekly column in pandas dataframe

I have a data as given below:

date    product price   amount
201901  A   10  20
201902  A   10  20
201903  A   20  30
201904  C   40  50

This data is saved in test.txt file. Date column is given as a weekly column as a concatenation of year and weekid. I am trying to set the date column as an index, with given code:

import pandas as pd
import numpy as np
data=pd.read_csv("test.txt", sep="\t", parse_dates=['date'])

But it gives an error. How can I set the date column as an index with datetime type?

Use index_col parameter for setting index :

data=pd.read_csv("test.txt", sep="\t", index_col=[0])

EDIT: Using column name as index:

data=pd.read_csv("test.txt", sep="\t", index_col=['date'])

For converting index from int to date time , do this:

data.index = pd.to_datetime(data.index, format='%Y%m')

There might be simpler solutions than this too, using apply first I converted your Year-Weekid into Year-month-day format and then just simply used set_index to make date as index column.

import pandas as pd

data ={
    'date' : [201901,201902,201903,201904,201905],
    'product' : ['A','A','A','C','C'],
    'price' : [10,10,10,20,20],
    'amount' : [20,20,30,50,60]
}


df = pd.DataFrame(data)

# str(x)+'1' converts to Year-WeekId-Weekday, so 1 represents `Monday` so 2019020 
# means 2019 Week2 Monday.
# If you want you can try with other formats too

df['date'] = df['date'].apply(lambda x: pd.to_datetime(str(x)+'1',format='%Y%W%w'))
df.set_index(['date'],inplace=True)
df

Edit:

To see datetime in Year-WeekID format you can style the dataframe as follows, however if you set date as index column following code won't be able to work. And also remember following code just applies some styling so just useful for display purpose only, internally it will remain as date-time object.

df['date'] = df['date'].apply(lambda x: pd.to_datetime(str(x)+'1',format='%Y%W%w'))
style_format = {'date':'{:%Y%W}'}
df.style.format(style_format)

在此处输入图像描述

You also can use the date_parser parameter:

import pandas as pd
from io import StringIO
from datetime import datetime

dateparse = lambda x: datetime.strptime(x, '%Y%m')

inputtxt = StringIO("""date    product price   amount
201901  A   10  20
201902  A   10  20
201903  A   20  30
201904  C   40  50""")

df = pd.read_csv(inputtxt, sep='\s+', parse_dates=['date'], date_parser=dateparse)
df.info()

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 4 columns):
 #   Column   Non-Null Count  Dtype         
---  ------   --------------  -----         
 0   date     4 non-null      datetime64[ns]
 1   product  4 non-null      object        
 2   price    4 non-null      int64         
 3   amount   4 non-null      int64         
dtypes: datetime64[ns](1), int64(2), object(1)
memory usage: 256.0+ bytes

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM