I want to add a new column in my data frame. I have a list of events and if any of these is different from 0 the value of the row in the new column should be 1.
I think it should be very simple, but i am fairly new to python.
The dataframe looks like this:
df = pd.DataFrame({"ID":[1,1,2,3],"Date":["01/01/2019","01/01/2019","02/01/2019","02/01/2019"],"Event_1":[1,0,0,0],"Event_2":[1,0,0,1],"Event_3":[0,1,0,1],"Other":[0,0,0,1]})
print(df)
ID Date Event_1 Event_2 Event_3 Other
1 01/01/2019 1 1 0 0
1 01/01/2019 0 0 1 0
2 02/01/2019 0 0 0 0
3 02/01/2019 0 1 1 1
And should look like this:
ID Date Event_1 Event_2 Event_3 Other Conditional_row
1 01/01/2019 1 1 0 0 1
1 01/01/2019 0 0 1 0 1
2 02/01/2019 0 0 0 0 0
3 02/01/2019 0 1 1 1 1
What is the easiest way of doing it? What is the best?
Use filter
+ any
Since all non-zero integers are Truthy in Python, calling any
directly on your DataFrame results in the correct mask. Since you want an integer output, we can use a memory efficient view
to view the boolean mask as a integer type.
df.filter(like="Event").any(1).view('i1')
0 1
1 1
2 0
3 1
dtype: int8
DataFrame.filter
, eq
and any
First we filter
the columns which start with Event
or Other
. Then we check if any
of the rows are eq
(equal) to 1
:
df['Conditional_row'] = df.filter(regex="^Event|^Other").eq(1).any(axis=1).astype(int)
ID Date Event_1 Event_2 Event_3 Other Conditional_row
0 1 01/01/2019 1 1 0 0 1
1 1 01/01/2019 0 0 1 0 1
2 2 02/01/2019 0 0 0 0 0
3 3 02/01/2019 0 1 1 1 1
Or use:
df['Conditional_row'] = df[['Event_1', 'Event_2', 'Event_3', 'Other']].ne(0).any(1).astype(int)
And now:
print(df)
Output:
ID Date Event_1 Event_2 Event_3 Conditional_row
0 1 01/01/2019 1 1 0 1
1 1 01/01/2019 0 0 1 1
2 2 02/01/2019 0 0 0 0
3 3 02/01/2019 0 1 1 1
Suppose your data frame is stored in an object called df
. I believe this is the most efficient way to do this:
df["Conditional_row"] = 0
df.loc[df[["Event_1","Event_2","Event_3","Other"]].sum(axis=1) > 0, "Conditional_row"] = 1
The output looks like this:
print(df)
ID Date Event_1 Event_2 Event_3 Other Conditional_row
0 1 01/01/2019 1 1 0 0 1
1 1 01/01/2019 0 0 1 0 1
2 2 02/01/2019 0 0 0 0 0
3 3 02/01/2019 0 1 1 1 1
What I did here was:
["Event_1","Event_2","Event_3","Other"]
is greater than 1. "Conditional_row"
of the rows that meet that condition are updated with the value 1. The code df[["Event_1","Event_2","Event_3","Other"]].sum(axis=1) > 0
is called a mask
and it returns a boolean array (a vector filled with True
and False
values). It selects all the rows where the return value is True
. Typically, slicing using boolean arrays is the most efficient way to manipulate data frames.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.