简体   繁体   中英

Python groupby - Create a new column based on values in other columns

I have a very large dataframe.
I wanna groupby the column 'id' first.
Then create a new column 'reply_time' based on the other existing columns.

import pandas as pd
import numpy as np

id = ['793601486525702000','793601486525702000','793601710614802000','793601355214561000','793601355214561000','793601355214561000','793601355214561000','788130215436230000','788130215436230000','788130215436230000','788130215436230000','788130215436230000']
time = ['11/1/2016 16:53','11/1/2016 16:53','11/1/2016 16:52','11/1/2016 16:55','11/1/2016 16:53','11/1/2016 16:53','11/1/2016 16:51','11/1/2016 3:09','11/1/2016 3:04','11/1/2016 2:36','11/1/2016 2:08','11/1/2016 0:28']
reply = ['3','3','0','3','3','2','1','3','2','3','3','1']

df = pd.DataFrame({"id": id, "time": time, "reply": reply})

        id                 time       reply 
793601486525702000  11/1/2016 16:53     3       
793601486525702000  11/1/2016 16:53     3       
793601710614802000  11/1/2016 16:52     0       
793601355214561000  11/1/2016 16:55     3       
793601355214561000  11/1/2016 16:53     3       
793601355214561000  11/1/2016 16:53     2       
793601355214561000  11/1/2016 16:51     1   
788130215436230000  11/1/2016 3:09      3       
788130215436230000  11/1/2016 3:04      2       
788130215436230000  11/1/2016 2:36      3       
788130215436230000  11/1/2016 2:08      3       
788130215436230000  11/1/2016 0:28      1   

There are two types of values in this new column 'reply_time'.

  1. 'time': groupby the column 'id' first, if reply = '1', return the 'time' value of reply = '2'.
  2. 'na': If the above conditions aren't met, the remaining rows should be assigned to 'na'.

In this case, my output data frame will be:

        id                 time       reply   reply_time
793601486525702000  11/1/2016 16:53     3        na
793601486525702000  11/1/2016 16:53     3        na
793601710614802000  11/1/2016 16:52     0        na
793601355214561000  11/1/2016 16:55     3        na
793601355214561000  11/1/2016 16:53     3        na
793601355214561000  11/1/2016 16:53     2        na
793601355214561000  11/1/2016 16:51     1    11/1/2016 16:53
788130215436230000  11/1/2016 3:09      3        na
788130215436230000  11/1/2016 3:04      2        na
788130215436230000  11/1/2016 2:36      3        na
788130215436230000  11/1/2016 2:08      3        na
788130215436230000  11/1/2016 0:28      1    11/1/2016 3:04 

I haven't got any idea the best way to achieve this. Can anyone help?

Thanks in advance!

切片后尝试mergereplace

yourdf=df.merge(df.query("reply=='2'").replace({'reply':{'2':'1'}}).rename(columns={'time':'reply_time'}),how='left')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM