Pandas - Dataframe - Conditional add

Question

I want to add a new column in my data frame. I have a list of events and if any of these is different from 0 the value of the row in the new column should be 1.

I think it should be very simple, but i am fairly new to python.

The dataframe looks like this:

df = pd.DataFrame({"ID":[1,1,2,3],"Date":["01/01/2019","01/01/2019","02/01/2019","02/01/2019"],"Event_1":[1,0,0,0],"Event_2":[1,0,0,1],"Event_3":[0,1,0,1],"Other":[0,0,0,1]})

print(df)
ID    Date         Event_1 Event_2 Event_3 Other
1     01/01/2019   1       1       0       0
1     01/01/2019   0       0       1       0
2     02/01/2019   0       0       0       0
3     02/01/2019   0       1       1       1

And should look like this:

ID    Date         Event_1 Event_2 Event_3 Other Conditional_row
1     01/01/2019   1       1       0       0     1
1     01/01/2019   0       0       1       0     1
2     02/01/2019   0       0       0       0     0
3     02/01/2019   0       1       1       1     1

What is the easiest way of doing it? What is the best?

Answer 1

Use filter + any

Since all non-zero integers are Truthy in Python, calling any directly on your DataFrame results in the correct mask. Since you want an integer output, we can use a memory efficient view to view the boolean mask as a integer type.

df.filter(like="Event").any(1).view('i1')

0    1
1    1
2    0
3    1
dtype: int8

Answer 2

Using `DataFrame.filter` , `eq` and `any`

First we filter the columns which start with Event or Other . Then we check if any of the rows are eq (equal) to 1 :

df['Conditional_row'] = df.filter(regex="^Event|^Other").eq(1).any(axis=1).astype(int)

   ID        Date  Event_1  Event_2  Event_3  Other  Conditional_row
0   1  01/01/2019        1        1        0      0                1
1   1  01/01/2019        0        0        1      0                1
2   2  02/01/2019        0        0        0      0                0
3   3  02/01/2019        0        1        1      1                1

Answer 3

Or use:

df['Conditional_row'] = df[['Event_1', 'Event_2', 'Event_3', 'Other']].ne(0).any(1).astype(int)

And now:

print(df)

Output:

   ID        Date  Event_1  Event_2  Event_3  Conditional_row
0   1  01/01/2019        1        1        0                1
1   1  01/01/2019        0        0        1                1
2   2  02/01/2019        0        0        0                0
3   3  02/01/2019        0        1        1                1

Answer 4

Suppose your data frame is stored in an object called df . I believe this is the most efficient way to do this:

df["Conditional_row"] = 0
df.loc[df[["Event_1","Event_2","Event_3","Other"]].sum(axis=1) > 0, "Conditional_row"] = 1

The output looks like this:

print(df)
   ID        Date  Event_1  Event_2  Event_3  Other  Conditional_row
0   1  01/01/2019        1        1        0      0                1
1   1  01/01/2019        0        0        1      0                1
2   2  02/01/2019        0        0        0      0                0
3   3  02/01/2019        0        1        1      1                1

What I did here was:

I created a new column filled with zeroes.
I selected all the rows where the row-wise sum of the columns in the list ["Event_1","Event_2","Event_3","Other"] is greater than 1.
The column "Conditional_row" of the rows that meet that condition are updated with the value 1.

The code df[["Event_1","Event_2","Event_3","Other"]].sum(axis=1) > 0 is called a mask and it returns a boolean array (a vector filled with True and False values). It selects all the rows where the return value is True . Typically, slicing using boolean arrays is the most efficient way to manipulate data frames.

Pandas - Dataframe - Conditional add

Question

4 answers

solution1
2 2019-08-14 13:16:32

solution2
2 2019-08-14 13:16:35

Using `DataFrame.filter` , `eq` and `any`

solution3
1 2019-08-14 13:19:11

solution4
1 ACCPTED 2019-08-15 15:17:37

Pandas - Dataframe - Conditional add

Question

4 answers

solution1 2 2019-08-14 13:16:32

solution2 2 2019-08-14 13:16:35

Using DataFrame.filter , eq and any

solution3 1 2019-08-14 13:19:11

solution4 1 ACCPTED 2019-08-15 15:17:37

solution1
2 2019-08-14 13:16:32

solution2
2 2019-08-14 13:16:35

Using `DataFrame.filter` , `eq` and `any`

solution3
1 2019-08-14 13:19:11

solution4
1 ACCPTED 2019-08-15 15:17:37