简体   繁体   中英

Create new columns based on multiple conditions in Python

I have the following dataframe:

  data = [
(27450, 27450, 29420,"10/10/2016"),
(29420 , 36142, 29420, "10/10/2016"),
(11 , 11, 27450, "10/10/2016")] 

#Create DataFrame base
df = pd.DataFrame(data, columns=("User_id","Actor1","Actor2", "Time"))

The first column contains the user_id, and each line represents one action that he makes. Each user_id shows up either in "Actor1" or "Actor2" column.

First, I would like to create a new column where it will assign the value 1 if the user_id is found in "Actor1" column and 0 otherwise.

Second, I would like to create a new column where for each user_id it will store the "Actor"_i value that he interacted with.

For the above example, the output will look like:

   Col1  Col2    
    1    29420    
    0    36142    
    1    27450    

What is the most efficient pythonic way to do this?

Thanks a lot in advance!

import numpy as np
import pandas as pd

data = [(27450, 27450, 29420,"10/10/2016"),
        (29420 , 36142, 29420, "10/10/2016"),
        (11 , 11, 27450, "10/10/2016")] 
df = pd.DataFrame(data, columns=("User_id","Actor1","Actor2", "Time"))
mask = (df['User_id'] == df['Actor1'])
df['first actor'] = mask.astype(int)
df['other actor'] = np.where(mask, df['Actor2'], df['Actor1'])
print(df)

yields

   User_id  Actor1  Actor2        Time  first actor  other actor
0    27450   27450   29420  10/10/2016            1        29420
1    29420   36142   29420  10/10/2016            0        36142
2       11      11   27450  10/10/2016            1        27450

First create a boolean mask which is True when User_id equals Actor1 :

In [51]: mask = (df['User_id'] == df['Actor1']); mask
Out[51]: 
0     True
1    False
2     True
dtype: bool

Converting mask to ints creates the first column:

In [52]: mask.astype(int)
Out[52]: 
0    1
1    0
2    1
dtype: int64

Then use np.where to select between two values. np.where(mask, A, B) returns an array whose ith value is A[i] if mask[i] is True, and B[i] otherwise. Thus, np.where(mask, df['Actor2'], df['Actor1']) takes the value from Actor2 where mask is True, and the value from Actor1 otherwise:

In [53]: np.where(mask, df['Actor2'], df['Actor1'])
Out[53]: array([29420, 36142, 27450])

Heres my solution - I have assumed that if userid appears in actor1 column its not necessary it'll be in the same row...

df["Col1"] = [1 if i in df["Actor1"].values else 0 for i in df["User_id"].values]
df["Col2"] = [df.iloc[i]["Actor2"] if j == 1 else df.iloc[i]["Actor1"] for i, j in enumerate(df["Col1"].values)]

Output -

User_id  Actor1  Actor2        Time  Col1   Col2
0    27450   27450   29420  10/10/2016     1  29420
1    29420   36142   29420  10/10/2016     0  36142
2       11      11   27450  10/10/2016     1  27450

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM