I have the following data coming into a dataframe via the read_excel
method:
Time ... 2020-04-05 00:00:00
0 1900-01-01 00:00:00 ... 4
1 1900-01-01 00:05:00 ... 1
2 1900-01-01 00:10:00 ... 1
I would like to combine the column header dates and row times so it looks more like:
Time ... value
0 2020-04-05 00:00:00 ... 4
1 2020-04-05 00:05:00 ... 1
2 2020-04-05 00:10:00 ... 1
I have tried the answers to this question and this question , but they are doing the opposite to me (time cols with date rows) and I think I'm messing up adjusting the code for my problem somewhere. Based on Q1 above, I have tried the following by swapping the timedelta and todates lines over as the columns are my dates and rows are my time:
data.Time = pd.to_timedelta(data.Time.astype(str) + ':00', unit='h')
data = data.set_index('Time')
data.columns = pd.to_datetime(data.Time)
data = data.stack()
data.index = data.index.get_level_values(0) + data.index.get_level_values(1)
data = data.reset_index()
data.columns = ['date', 'val']
I receive an error on the first line of ValueError: unit must not be specified if the input contains a str
which has confused me as I do specify a unit type. I feel like this is the answer and I'm close, I'm just missing something and I can't figure it out - how can I combine my date columns with my time rows?
Data Types being used: Time = datetime64[ns], 2019-12-02 00:00:00 (etc.) = int64
EDIT: mis-read the error and thought it said the unit was missing. I removed the unit, but received an alternative error of ValueError: only leading negative signs are allowed
I think in your solution is close, only need reassing columns names converted to datetimes and remove unit='h'
from to_timedelta
with convert dattimes to HH:MM:SS
strings:
np.random.seed(102)
c = ['Time', '2019-12-02 00:00:00', '2019-12-03 00:00:00',
'2019-12-04 00:00:00', '2019-12-05 00:00:00']
t = pd.to_datetime(['1900-01-01 00:00:00', '1900-01-01 00:05:00', '1900-01-01 00:10:00'])
data=pd.DataFrame(np.random.randint(10, size=(len(t), len(c))), columns=c)
data['Time'] = t
print (data)
Time 2019-12-02 00:00:00 2019-12-03 00:00:00 \
0 1900-01-01 00:00:00 3 2
1 1900-01-01 00:05:00 8 8
2 1900-01-01 00:10:00 7 0
2019-12-04 00:00:00 2019-12-05 00:00:00
0 2 2
1 9 7
2 6 2
print (data.columns)
Index(['Time', '2019-12-02 00:00:00', '2019-12-03 00:00:00',
'2019-12-04 00:00:00', '2019-12-05 00:00:00'],
dtype='object')
print (data['Time'])
0 1900-01-01 00:00:00
1 1900-01-01 00:05:00
2 1900-01-01 00:10:00
Name: Time, dtype: datetime64[ns]
data.Time = pd.to_timedelta(data.Time.dt.strftime('%H:%M:%S'))
data = data.set_index('Time')
#convert data.columns to datetimes and assign back
data.columns = pd.to_datetime(data.columns)
data = data.stack()
data.index = data.index.get_level_values(0) + data.index.get_level_values(1)
data = data.sort_index().reset_index()
data.columns = ['date', 'val']
print (data)
date val
0 2019-12-02 00:00:00 3
1 2019-12-02 00:05:00 8
2 2019-12-02 00:10:00 7
3 2019-12-03 00:00:00 2
4 2019-12-03 00:05:00 8
5 2019-12-03 00:10:00 0
6 2019-12-04 00:00:00 2
7 2019-12-04 00:05:00 9
8 2019-12-04 00:10:00 6
9 2019-12-05 00:00:00 2
10 2019-12-05 00:05:00 7
11 2019-12-05 00:10:00 2
Or:
df = data.melt('Time', var_name='Date', value_name='val')
df['Date'] = (pd.to_datetime(df['Date']) +
pd.to_timedelta(df.pop('Time').dt.strftime('%H:%M:%S')))
df = df.sort_values('Date', ignore_index=True)
print (df)
Date val
0 2019-12-02 00:00:00 3
1 2019-12-02 00:05:00 8
2 2019-12-02 00:10:00 7
3 2019-12-03 00:00:00 2
4 2019-12-03 00:05:00 8
5 2019-12-03 00:10:00 0
6 2019-12-04 00:00:00 2
7 2019-12-04 00:05:00 9
8 2019-12-04 00:10:00 6
9 2019-12-05 00:00:00 2
10 2019-12-05 00:05:00 7
11 2019-12-05 00:10:00 2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.