根据条件从母数据帧创建子熊猫数据帧

Question

I have a dataframe as follows.我有一个如下的数据框。 Actually this dataframe is made by outer join of two table.实际上这个数据框是由两个表的outer连接制成的。

IndentID    IndentNo    role_name     role_id    user_id       ishod    isdesignatedhod   Flag
100         1234         xyz            3         17            1         nan           right_only
nan         nan          nan           -1         -1            None      None          right_only
nan         nan          nan            1         15            None      None          right_only
nan         nan          nan           100        100           None       1            right_only

Objective: I want a resultant dataframe based on column conditions.目标：我想要一个基于列条件的结果数据框。 The conditions are given below条件如下

if ishod == 1 the resultant df will be:
IndentID    IndentNo    role_name     role_id    user_id
100         1234         xyz            3         17

if ishod!=1 and isdesignatedhod==1 the resultant df will be:
IndentID    IndentNo    role_name     role_id    user_id
100         1234         xyz            100         100

I am really clueless on how to proceed on this.我真的不知道如何继续这个。 Any clue will be appreciated!!任何线索将不胜感激！

Answer 1

This should do approximately what you want这应该做大约你想要的

nan = float("nan")
def func(row):
    if row["ishod"] == "1":
        return pd.Series([100, 1234, "xyz", 3, 17, nan, nan, nan], index=row.index)
    elif row["isdesignatedhod"] == "1":
        return pd.Series([100, 1234, "xyz", 100, 100, nan, nan, nan], index=row.index)
    else:
        return row


pd.read_csv(io.StringIO(
"""IndentID    IndentNo    role_name     role_id    user_id       ishod    isdesignatedhod   Flag
100         1234         xyz            3         17            1         nan           right_only
nan         nan          nan           -1         -1            None      None          right_only
nan         nan          nan            1         15            None      None          right_only
nan         nan          nan           100        100           None       1            right_only
"""), sep=" +", engine='python')\
.apply(func,axis=1)

Output:输出：

   IndentID  IndentNo role_name  role_id  user_id ishod isdesignatedhod        Flag
0     100.0    1234.0       xyz        3       17   NaN             NaN         NaN
1       NaN       NaN       NaN       -1       -1  None            None  right_only
2       NaN       NaN       NaN        1       15  None            None  right_only
3     100.0    1234.0       xyz      100      100   NaN             NaN         NaN

Answer 2

To select rows based on a value in a certain column you can do use the following notation:要根据特定列中的值选择行，您可以使用以下表示法：

df[ df["column_name"] == value_to_keep ]

Here is an example of this in action:这是一个实际的例子：

import pandas as pd

d = {'col1': [1,2,1,2,1,2,1,2,1,2,1,2,1,2,1],
     'col2': [3,4,5,3,4,5,3,4,5,3,4,5,3,4,5],
     'col3': [6,7,8,9,6,7,8,9,6,7,8,9,6,7,8]}
# create a dataframe 
df = pd.DataFrame(d)

This is what df looks like:这是 df 的样子：

In [17]: df
Out[17]: 
    col1  col2  col3
0      1     3     6
1      2     4     7
2      1     5     8
3      2     3     9
4      1     4     6
5      2     5     7
6      1     3     8
7      2     4     9
8      1     5     6
9      2     3     7
10     1     4     8
11     2     5     9
12     1     3     6
13     2     4     7
14     1     5     8

Now to select all rows for which the value is '2' in the first column:现在选择第一列中值为“2”的所有行：

df_1 = df[df["col1"] == 2]

In  [19]: df_1
Out [19]: 
    col1  col2  col3
1      2     4     7
3      2     3     9
5      2     5     7
7      2     4     9
9      2     3     7
11     2     5     9
13     2     4     7

You can also multiple conditions this way:您还可以通过这种方式满足多个条件：

df_2 = df[(df["col2"] >= 4) & (df["col3"] != 7)]


In  [22]: df_2
Out [22]: 
    col1  col2  col3
2      1     5     8
4      1     4     6
7      2     4     9
8      1     5     6
10     1     4     8
11     2     5     9
14     1     5     8

Hope this example helps!希望这个例子有帮助！

Answer 3

Andre gives the right answer.安德烈给出了正确的答案。 Also you have to keep in mind dtype of columns ishod and isdesignatedhod .此外你必须保持在列的头脑D型ishod和isdesignatedhod 。 They are "object" type, in this specifically case "strings".它们是“对象”类型，在这种特殊情况下是“字符串”。 So you have to use "quotes" when compare these object columns with numbers.因此，在将这些对象列与数字进行比较时，您必须使用“引号”。

df[df["ishod"] == "1"]

根据条件从母数据帧创建子熊猫数据帧

问题描述

3 个解决方案

解决方案1
0 2020-10-12 14:19:13

解决方案2
0 2020-10-12 14:37:14

解决方案3
0 2020-10-12 15:28:38

根据条件从母数据帧创建子熊猫数据帧

问题描述

3 个解决方案

解决方案1 0 2020-10-12 14:19:13

解决方案2 0 2020-10-12 14:37:14

解决方案3 0 2020-10-12 15:28:38

解决方案1
0 2020-10-12 14:19:13

解决方案2
0 2020-10-12 14:37:14

解决方案3
0 2020-10-12 15:28:38