简体   繁体   中英

Assign column to pandas df

I am trying to assign a Column to an existing df . Specifically, certain timestamps get sorted but the current export is a separate series . I'd like to append this to the df .

import pandas as pd

d = ({           
    'time' : ['08:00:00 am','12:00:00 pm','16:00:00 pm','20:00:00 pm','2:00:00 am','13:00:00 pm','3:00:00 am'], 
    'code' : ['A','B','C','A','B','C','A'], 
    })

df = pd.DataFrame(data=d)

df['time'] = pd.to_timedelta(df['time'])

cutoff, day = pd.to_timedelta(['3.5H', '24H'])
df.time.apply(lambda x: x if x > cutoff else x + day).sort_values().reset_index(drop=True)
x = df.time.apply(lambda x: x if x > cutoff else x + day).sort_values().reset_index(drop=True).dt.components
x = x.apply(lambda x: '{:02d}:{:02d}:{:02d}'.format(x.days*24+x.hours, x.minutes, x.seconds), axis=1)

Output:

0    08:00:00
1    12:00:00
2    13:00:00
3    16:00:00
4    20:00:00
5    26:00:00
6    27:00:00

I've altered

df['time'] = x.apply(lambda x: '{:02d}:{:02d}:{:02d}'.format(x.days*24+x.hours, x.minutes, x.seconds), axis=1)

But this produces

       time code
0  08:00:00    A
1  12:00:00    B
2  13:00:00    C
3  16:00:00    A
4  20:00:00    B
5  26:00:00    C
6  27:00:00    A

As you can see. The timestamps aren't aligned with their respective values after sorting.

The intended output is:

       time code
0  08:00:00    A
1  12:00:00    B
2  13:00:00    C
3  16:00:00    C
4  20:00:00    A
5  26:00:00    B
6  27:00:00    A

I hope this is what you want:

import pandas as pd

d = ({           
    'time' : ['08:00:00 am','12:00:00 pm','16:00:00 pm','20:00:00 pm','2:00:00 am','13:00:00 pm','3:00:00 am'], 
    'code' : ['A','B','C','A','B','C','A'], 
    })

df = pd.DataFrame(data=d)

df['time'] = pd.to_timedelta(df['time'])

cutoff, day = pd.to_timedelta(['3.5H', '24H'])
df.time.apply(lambda x: x if x > cutoff else x + day).sort_values().reset_index(drop=True)
print(df)
x = df.time.apply(lambda x: x if x > cutoff else x + day).sort_values().reset_index(drop=True).dt.components
df['time'] = x.apply(lambda x: '{:02d}:{:02d}:{:02d}'.format(x.days*24+x.hours, x.minutes, x.seconds), axis=1)

print(df)

Remove reset_index(drop=True) from your code and sort later may work for you.

import pandas as pd

d = ({           
    'time' : ['08:00:00 am','12:00:00 pm','16:00:00 pm','20:00:00 pm','2:00:00 am','13:00:00 pm','3:00:00 am'], 
    'code' : ['A','B','C','A','B','C','A'], 
    })

df = pd.DataFrame(data=d)

df['time'] = pd.to_timedelta(df['time'])

cutoff, day = pd.to_timedelta(['3.5H', '24H'])

x = df.time.apply(lambda x: x if x > cutoff else x + day).dt.components
df['time'] = x.apply(lambda x: '{:02d}:{:02d}:{:02d}'.format(x.days*24+x.hours, x.minutes, x.seconds), axis=1)
df = df.sort_values('time')

print(df)

Pandas do alignment via index. reset_index(drop=True) destructed the original index and caused the sorted time column assigned back sequentially. This is probably why you didn't get what you what.

original time column.

0   08:00:00
1   12:00:00
2   16:00:00
3   20:00:00
4   02:00:00
5   13:00:00
6   03:00:00

after sort_values().

4   02:00:00
6   03:00:00
0   08:00:00
1   12:00:00
5   13:00:00
2   16:00:00
3   20:00:00

after reset_index(drop=True)

0   02:00:00
1   03:00:00
2   08:00:00
3   12:00:00
4   13:00:00
5   16:00:00
6   20:00:00

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM