[英]Creating child pandas dataframe from a mother dataframe based on condition
I have a dataframe as follows.我有一个如下的数据框。 Actually this dataframe is made by
outer
join of two table.实际上这个数据框是由两个表的
outer
连接制成的。
IndentID IndentNo role_name role_id user_id ishod isdesignatedhod Flag
100 1234 xyz 3 17 1 nan right_only
nan nan nan -1 -1 None None right_only
nan nan nan 1 15 None None right_only
nan nan nan 100 100 None 1 right_only
Objective: I want a resultant dataframe based on column conditions.目标:我想要一个基于列条件的结果数据框。 The conditions are given below
条件如下
if ishod == 1 the resultant df will be:
IndentID IndentNo role_name role_id user_id
100 1234 xyz 3 17
if ishod!=1 and isdesignatedhod==1 the resultant df will be:
IndentID IndentNo role_name role_id user_id
100 1234 xyz 100 100
I am really clueless on how to proceed on this.我真的不知道如何继续这个。 Any clue will be appreciated!!
任何线索将不胜感激!
This should do approximately what you want这应该做大约你想要的
nan = float("nan")
def func(row):
if row["ishod"] == "1":
return pd.Series([100, 1234, "xyz", 3, 17, nan, nan, nan], index=row.index)
elif row["isdesignatedhod"] == "1":
return pd.Series([100, 1234, "xyz", 100, 100, nan, nan, nan], index=row.index)
else:
return row
pd.read_csv(io.StringIO(
"""IndentID IndentNo role_name role_id user_id ishod isdesignatedhod Flag
100 1234 xyz 3 17 1 nan right_only
nan nan nan -1 -1 None None right_only
nan nan nan 1 15 None None right_only
nan nan nan 100 100 None 1 right_only
"""), sep=" +", engine='python')\
.apply(func,axis=1)
Output:输出:
IndentID IndentNo role_name role_id user_id ishod isdesignatedhod Flag
0 100.0 1234.0 xyz 3 17 NaN NaN NaN
1 NaN NaN NaN -1 -1 None None right_only
2 NaN NaN NaN 1 15 None None right_only
3 100.0 1234.0 xyz 100 100 NaN NaN NaN
To select rows based on a value in a certain column you can do use the following notation:要根据特定列中的值选择行,您可以使用以下表示法:
df[ df["column_name"] == value_to_keep ]
Here is an example of this in action:这是一个实际的例子:
import pandas as pd
d = {'col1': [1,2,1,2,1,2,1,2,1,2,1,2,1,2,1],
'col2': [3,4,5,3,4,5,3,4,5,3,4,5,3,4,5],
'col3': [6,7,8,9,6,7,8,9,6,7,8,9,6,7,8]}
# create a dataframe
df = pd.DataFrame(d)
This is what df looks like:这是 df 的样子:
In [17]: df
Out[17]:
col1 col2 col3
0 1 3 6
1 2 4 7
2 1 5 8
3 2 3 9
4 1 4 6
5 2 5 7
6 1 3 8
7 2 4 9
8 1 5 6
9 2 3 7
10 1 4 8
11 2 5 9
12 1 3 6
13 2 4 7
14 1 5 8
Now to select all rows for which the value is '2' in the first column:现在选择第一列中值为“2”的所有行:
df_1 = df[df["col1"] == 2]
In [19]: df_1
Out [19]:
col1 col2 col3
1 2 4 7
3 2 3 9
5 2 5 7
7 2 4 9
9 2 3 7
11 2 5 9
13 2 4 7
You can also multiple conditions this way:您还可以通过这种方式满足多个条件:
df_2 = df[(df["col2"] >= 4) & (df["col3"] != 7)]
In [22]: df_2
Out [22]:
col1 col2 col3
2 1 5 8
4 1 4 6
7 2 4 9
8 1 5 6
10 1 4 8
11 2 5 9
14 1 5 8
Hope this example helps!希望这个例子有帮助!
Andre gives the right answer.安德烈给出了正确的答案。 Also you have to keep in mind dtype of columns
ishod
and isdesignatedhod
.此外你必须保持在列的头脑D型
ishod
和isdesignatedhod
。 They are "object" type, in this specifically case "strings".它们是“对象”类型,在这种特殊情况下是“字符串”。 So you have to use "quotes" when compare these object columns with numbers.
因此,在将这些对象列与数字进行比较时,您必须使用“引号”。
df[df["ishod"] == "1"]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.