简体   繁体   English

根据条件从母数据帧创建子熊猫数据帧

[英]Creating child pandas dataframe from a mother dataframe based on condition

I have a dataframe as follows.我有一个如下的数据框。 Actually this dataframe is made by outer join of two table.实际上这个数据框是由两个表的outer连接制成的。

IndentID    IndentNo    role_name     role_id    user_id       ishod    isdesignatedhod   Flag
100         1234         xyz            3         17            1         nan           right_only
nan         nan          nan           -1         -1            None      None          right_only
nan         nan          nan            1         15            None      None          right_only
nan         nan          nan           100        100           None       1            right_only

Objective: I want a resultant dataframe based on column conditions.目标:我想要一个基于列条件的结果数据框。 The conditions are given below条件如下

if ishod == 1 the resultant df will be:
IndentID    IndentNo    role_name     role_id    user_id
100         1234         xyz            3         17

if ishod!=1 and isdesignatedhod==1 the resultant df will be:
IndentID    IndentNo    role_name     role_id    user_id
100         1234         xyz            100         100

I am really clueless on how to proceed on this.我真的不知道如何继续这个。 Any clue will be appreciated!!任何线索将不胜感激!

This should do approximately what you want这应该做大约你想要的

nan = float("nan")
def func(row):
    if row["ishod"] == "1":
        return pd.Series([100, 1234, "xyz", 3, 17, nan, nan, nan], index=row.index)
    elif row["isdesignatedhod"] == "1":
        return pd.Series([100, 1234, "xyz", 100, 100, nan, nan, nan], index=row.index)
    else:
        return row


pd.read_csv(io.StringIO(
"""IndentID    IndentNo    role_name     role_id    user_id       ishod    isdesignatedhod   Flag
100         1234         xyz            3         17            1         nan           right_only
nan         nan          nan           -1         -1            None      None          right_only
nan         nan          nan            1         15            None      None          right_only
nan         nan          nan           100        100           None       1            right_only
"""), sep=" +", engine='python')\
.apply(func,axis=1)

Output:输出:

   IndentID  IndentNo role_name  role_id  user_id ishod isdesignatedhod        Flag
0     100.0    1234.0       xyz        3       17   NaN             NaN         NaN
1       NaN       NaN       NaN       -1       -1  None            None  right_only
2       NaN       NaN       NaN        1       15  None            None  right_only
3     100.0    1234.0       xyz      100      100   NaN             NaN         NaN

To select rows based on a value in a certain column you can do use the following notation:要根据特定列中的值选择行,您可以使用以下表示法:

df[ df["column_name"] == value_to_keep ]

Here is an example of this in action:这是一个实际的例子:

import pandas as pd

d = {'col1': [1,2,1,2,1,2,1,2,1,2,1,2,1,2,1],
     'col2': [3,4,5,3,4,5,3,4,5,3,4,5,3,4,5],
     'col3': [6,7,8,9,6,7,8,9,6,7,8,9,6,7,8]}
# create a dataframe 
df = pd.DataFrame(d)

This is what df looks like:这是 df 的样子:

In [17]: df
Out[17]: 
    col1  col2  col3
0      1     3     6
1      2     4     7
2      1     5     8
3      2     3     9
4      1     4     6
5      2     5     7
6      1     3     8
7      2     4     9
8      1     5     6
9      2     3     7
10     1     4     8
11     2     5     9
12     1     3     6
13     2     4     7
14     1     5     8

Now to select all rows for which the value is '2' in the first column:现在选择第一列中值为“2”的所有行:

df_1 = df[df["col1"] == 2]

In  [19]: df_1
Out [19]: 
    col1  col2  col3
1      2     4     7
3      2     3     9
5      2     5     7
7      2     4     9
9      2     3     7
11     2     5     9
13     2     4     7

You can also multiple conditions this way:您还可以通过这种方式满足多个条件:

df_2 = df[(df["col2"] >= 4) & (df["col3"] != 7)]


In  [22]: df_2
Out [22]: 
    col1  col2  col3
2      1     5     8
4      1     4     6
7      2     4     9
8      1     5     6
10     1     4     8
11     2     5     9
14     1     5     8

Hope this example helps!希望这个例子有帮助!

Andre gives the right answer.安德烈给出了正确的答案。 Also you have to keep in mind dtype of columns ishod and isdesignatedhod .此外你必须保持在列的头脑D型ishodisdesignatedhod They are "object" type, in this specifically case "strings".它们是“对象”类型,在这种特殊情况下是“字符串”。 So you have to use "quotes" when compare these object columns with numbers.因此,在将这些对象列与数字进行比较时,您必须使用“引号”。

df[df["ishod"] == "1"]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM