简体   繁体   中英

Python Pandas New Column based on values from other columns

I have a DF that looks likes this.

Each id has number of columns. The logic is to look at T1 and check if the program has been seen at T0. Based on the finding, a new column will be created. If it is found at T0, new column will have the same the name . If it's not seen at T0, then the name is incremented so it will be _2 .

Python

data = '''id, name, Time, program
1, bb, T0, a1 
1, ch, T0, a1
1, cc, T0, b1
1, ch_1, T1, a1
1, ch_1, T1, a2
1, ch_1, T1, a3
1, ch_1, T1, a4
1, cc_1, T1, b1
1, cc_1, T1, b2
1, cc_1, T1, b3
2, dd, T0, c1
2, ch, T0, a1
2, cc, T0, b1
2, ch_1, T1, a1
2, ch_1, T1, a2
2, ch_1, T1, a3
2, cc_1, T1, b1
2, cc_1, T1, b2
2, cc_1, T1, b3'''
da = [[i.strip() for i in l.split(",")] for l in data.split("\n")]
df = pd.DataFrame(da[1:], columns=da[0])

Output

id      name        Time      program 
1        bb          T0        a1   
1        ch          T0        a1      
1        cc          T0        b1      
1        ch_1        T1        a1      
1        ch_1        T1        a2      
1        ch_1        T1        a3     
1        ch_1        T1        a4   
1        cc_1        T1        b1      
1        cc_1        T1        b2      
1        cc_1        T1        b3 
2        dd          T0        c1     
2        ch          T0        a1      
2        cc          T0        b1      
2        ch_1        T1        a1      
2        ch_1        T1        a2      
2        ch_1        T1        a3      
2        cc_1        T1        b1      
2        cc_1        T1        b2      
2        cc_1        T1        b3 

Here is the final expected output.

id      name         Time      program    new_name
1        bb           T0        a1          bb
1        ch           T0        a1          ch                      
1        cc           T0        b1          cc                      
1        ch_1         T1        a1          ch_1                  
1        ch_1         T1        a2          ch_2  <--- 
1        ch_1         T1        a3          ch_2  <---
1        ch_1         T1        a4          ch_2  <---  
1        cc_1         T1        b1          cc_1                 
1        cc_1         T1        b2          cc_2  <--- 
1        cc_1         T1        b3          cc_2  <---
2        dd           T0        c1          dd
2        ch           T0        a1          ch                      
2        cc           T0        b1          cc                      
2        ch_1         T1        a1          ch_1                  
2        ch_1         T1        a2          ch_2  <--- 
2        ch_1         T1        a3          ch_2  <--- 
2        cc_1         T1        b1          cc_1                 
2        cc_1         T1        b2          cc_2  <--- 
2        cc_1         T1        b3          cc_2  <---   

Let us try apply + np.where , not fast but work

s=df.groupby('id').apply(lambda x : x['Time'].eq('T1') & ~x['program'].isin(x['program'][x['Time'].eq('T0')])).reset_index(level=0,drop=True)
df['New Name']=np.where(s, df['name'].str[:-1]+'2', df['name'])

df
    id  name Time program New Name
0    1    bb   T0      a1       bb
1    1    ch   T0      a1       ch
2    1    cc   T0      b1       cc
3    1  ch_1   T1      a1     ch_1
4    1  ch_1   T1      a2     ch_2
5    1  ch_1   T1      a3     ch_2
6    1  ch_1   T1      a4     ch_2
7    1  cc_1   T1      b1     cc_1
8    1  cc_1   T1      b2     cc_2
9    1  cc_1   T1      b3     cc_2
10   2    dd   T0      c1       dd
11   2    ch   T0      a1       ch
12   2    cc   T0      b1       cc
13   2  ch_1   T1      a1     ch_1
14   2  ch_1   T1      a2     ch_2
15   2  ch_1   T1      a3     ch_2
16   2  cc_1   T1      b1     cc_1
17   2  cc_1   T1      b2     cc_2
18   2  cc_1   T1      b3     cc_2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM