熊猫 - 数据框 - 条件添加

Question

我想在我的数据框中添加一个新列。 我有一个事件列表，如果其中任何一个与0不同，则新列中行的值应为1。

我认为它应该很简单，但我对python来说相当新。

数据框如下所示：

df = pd.DataFrame({"ID":[1,1,2,3],"Date":["01/01/2019","01/01/2019","02/01/2019","02/01/2019"],"Event_1":[1,0,0,0],"Event_2":[1,0,0,1],"Event_3":[0,1,0,1],"Other":[0,0,0,1]})

print(df)
ID    Date         Event_1 Event_2 Event_3 Other
1     01/01/2019   1       1       0       0
1     01/01/2019   0       0       1       0
2     02/01/2019   0       0       0       0
3     02/01/2019   0       1       1       1

应该是这样的：

ID    Date         Event_1 Event_2 Event_3 Other Conditional_row
1     01/01/2019   1       1       0       0     1
1     01/01/2019   0       0       1       0     1
2     02/01/2019   0       0       0       0     0
3     02/01/2019   0       1       1       1     1

这样做最简单的方法是什么？ 哪个最好？

Answer 1

使用filter + any

由于所有非零整数是Truthy在Python，要求any在正确的面膜你的数据帧结果直接。 既然你想要一个整数输出，我们可以使用一个内存使用效率view以查看布尔掩码为整数类型。

df.filter(like="Event").any(1).view('i1')

0    1
1    1
2    0
3    1
dtype: int8

Answer 2

使用`DataFrame.filter` ， `eq`和`any`

首先，我们filter以Event或Other开头的列。 然后我们检查是否有any行eq （等于） 1 ：

df['Conditional_row'] = df.filter(regex="^Event|^Other").eq(1).any(axis=1).astype(int)

   ID        Date  Event_1  Event_2  Event_3  Other  Conditional_row
0   1  01/01/2019        1        1        0      0                1
1   1  01/01/2019        0        0        1      0                1
2   2  02/01/2019        0        0        0      0                0
3   3  02/01/2019        0        1        1      1                1

Answer 3

或使用：

df['Conditional_row'] = df[['Event_1', 'Event_2', 'Event_3', 'Other']].ne(0).any(1).astype(int)

现在：

print(df)

输出：

   ID        Date  Event_1  Event_2  Event_3  Conditional_row
0   1  01/01/2019        1        1        0                1
1   1  01/01/2019        0        0        1                1
2   2  02/01/2019        0        0        0                0
3   3  02/01/2019        0        1        1                1

Answer 4

假设您的数据框存储在名为df的对象中。 我相信这是最有效的方法：

df["Conditional_row"] = 0
df.loc[df[["Event_1","Event_2","Event_3","Other"]].sum(axis=1) > 0, "Conditional_row"] = 1

输出如下所示：

print(df)
   ID        Date  Event_1  Event_2  Event_3  Other  Conditional_row
0   1  01/01/2019        1        1        0      0                1
1   1  01/01/2019        0        0        1      0                1
2   2  02/01/2019        0        0        0      0                0
3   3  02/01/2019        0        1        1      1                1

我在这里做的是：

我创建了一个填充零的新列。
我选择了列表["Event_1","Event_2","Event_3","Other"]列的行方式总和大于1的所有行。
符合该条件的行的"Conditional_row"列将使用值1更新。

代码df[["Event_1","Event_2","Event_3","Other"]].sum(axis=1) > 0称为mask ，它返回一个布尔数组（一个填充True和False值的向量））。 它选择返回值为True所有行。 通常，使用布尔数组进行切片是操作数据帧的最有效方法。

熊猫 - 数据框 - 条件添加

问题描述

4 个解决方案

解决方案1
2 2019-08-14 13:16:32

解决方案2
2 2019-08-14 13:16:35

使用`DataFrame.filter` ， `eq`和`any`

解决方案3
1 2019-08-14 13:19:11

解决方案4
1 已采纳 2019-08-15 15:17:37

熊猫 - 数据框 - 条件添加

问题描述

4 个解决方案

解决方案1 2 2019-08-14 13:16:32

解决方案2 2 2019-08-14 13:16:35

使用DataFrame.filter ， eq和any

解决方案3 1 2019-08-14 13:19:11

解决方案4 1 已采纳 2019-08-15 15:17:37

解决方案1
2 2019-08-14 13:16:32

解决方案2
2 2019-08-14 13:16:35

使用`DataFrame.filter` ， `eq`和`any`

解决方案3
1 2019-08-14 13:19:11

解决方案4
1 已采纳 2019-08-15 15:17:37