简体   繁体   中英

Pandas DataFrame - Combine Date column headers with Time rows

I have the following data coming into a dataframe via the read_excel method:

                 Time  ...  2020-04-05 00:00:00
0 1900-01-01 00:00:00  ...                    4
1 1900-01-01 00:05:00  ...                    1
2 1900-01-01 00:10:00  ...                    1

I would like to combine the column header dates and row times so it looks more like:

                 Time  ...   value
0 2020-04-05 00:00:00  ...       4
1 2020-04-05 00:05:00  ...       1
2 2020-04-05 00:10:00  ...       1

I have tried the answers to this question and this question , but they are doing the opposite to me (time cols with date rows) and I think I'm messing up adjusting the code for my problem somewhere. Based on Q1 above, I have tried the following by swapping the timedelta and todates lines over as the columns are my dates and rows are my time:

data.Time = pd.to_timedelta(data.Time.astype(str) + ':00', unit='h')
data = data.set_index('Time')
data.columns = pd.to_datetime(data.Time)
data = data.stack()
data.index = data.index.get_level_values(0) + data.index.get_level_values(1)
data = data.reset_index()
data.columns = ['date', 'val']

I receive an error on the first line of ValueError: unit must not be specified if the input contains a str which has confused me as I do specify a unit type. I feel like this is the answer and I'm close, I'm just missing something and I can't figure it out - how can I combine my date columns with my time rows?

Data Types being used: Time = datetime64[ns], 2019-12-02 00:00:00 (etc.) = int64

EDIT: mis-read the error and thought it said the unit was missing. I removed the unit, but received an alternative error of ValueError: only leading negative signs are allowed

I think in your solution is close, only need reassing columns names converted to datetimes and remove unit='h' from to_timedelta with convert dattimes to HH:MM:SS strings:

np.random.seed(102)
c = ['Time', '2019-12-02 00:00:00', '2019-12-03 00:00:00', 
             '2019-12-04 00:00:00', '2019-12-05 00:00:00']
t = pd.to_datetime(['1900-01-01 00:00:00', '1900-01-01 00:05:00', '1900-01-01 00:10:00'])

data=pd.DataFrame(np.random.randint(10, size=(len(t), len(c))), columns=c)
data['Time'] = t

print (data)
                 Time  2019-12-02 00:00:00  2019-12-03 00:00:00  \
0 1900-01-01 00:00:00                    3                    2   
1 1900-01-01 00:05:00                    8                    8   
2 1900-01-01 00:10:00                    7                    0   

   2019-12-04 00:00:00  2019-12-05 00:00:00  
0                    2                    2  
1                    9                    7  
2                    6                    2  

print (data.columns)
Index(['Time', '2019-12-02 00:00:00', '2019-12-03 00:00:00',
       '2019-12-04 00:00:00', '2019-12-05 00:00:00'],
      dtype='object')

print (data['Time'])
0   1900-01-01 00:00:00
1   1900-01-01 00:05:00
2   1900-01-01 00:10:00
Name: Time, dtype: datetime64[ns]

data.Time = pd.to_timedelta(data.Time.dt.strftime('%H:%M:%S'))

data = data.set_index('Time')
#convert data.columns to datetimes and assign back
data.columns = pd.to_datetime(data.columns)
data = data.stack()
data.index = data.index.get_level_values(0) + data.index.get_level_values(1)
data = data.sort_index().reset_index()
data.columns = ['date', 'val']

print (data)
                  date  val
0  2019-12-02 00:00:00    3
1  2019-12-02 00:05:00    8
2  2019-12-02 00:10:00    7
3  2019-12-03 00:00:00    2
4  2019-12-03 00:05:00    8
5  2019-12-03 00:10:00    0
6  2019-12-04 00:00:00    2
7  2019-12-04 00:05:00    9
8  2019-12-04 00:10:00    6
9  2019-12-05 00:00:00    2
10 2019-12-05 00:05:00    7
11 2019-12-05 00:10:00    2

Or:

df = data.melt('Time', var_name='Date', value_name='val')
df['Date'] = (pd.to_datetime(df['Date']) +  
                  pd.to_timedelta(df.pop('Time').dt.strftime('%H:%M:%S')))
df = df.sort_values('Date', ignore_index=True)
print (df)
                  Date  val
0  2019-12-02 00:00:00    3
1  2019-12-02 00:05:00    8
2  2019-12-02 00:10:00    7
3  2019-12-03 00:00:00    2
4  2019-12-03 00:05:00    8
5  2019-12-03 00:10:00    0
6  2019-12-04 00:00:00    2
7  2019-12-04 00:05:00    9
8  2019-12-04 00:10:00    6
9  2019-12-05 00:00:00    2
10 2019-12-05 00:05:00    7
11 2019-12-05 00:10:00    2
    

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM