[英]how to create and fill a new column based on conditions in two other columns?
如何创建一个新列并根据其他两列的条件用值填充它?
输入:
import pandas as pd
import numpy as np
list1 = ['no','no','yes','yes','no','no','no','yes','no','yes','yes','no','no','no']
list2 = ['no','no','no','no','no','yes','yes','no','no','no','no','no','yes','no']
df = pd.DataFrame({'A' : list1, 'B' : list2}, columns = ['A', 'B'])
df['C'] = np.where ((df['A'] == 'yes') & (df['A'].shift(1) == 'no'), 'X', np.nan)
df['D'] = 'nan','nan','X','X','X','X','nan','X','X','X','X','X','X','nan'
print (df)
输出:
A B C D
0 no no nan nan
1 no no nan nan
2 yes no X X
3 yes no nan X
4 no no nan X
5 no yes nan X
6 no yes nan nan
7 yes no X X
8 no no nan X
9 yes no X X
10 yes no nan X
11 no no nan X
12 no yes nan X
13 no no nan nan
将给出 A 列和 B 列,并且仅包含“是”或“否”值。 只能有三个可能的对('no'-'no'、'yes'-'no' 或 'no'-'yes')。 永远不会有“是”-“是”对。
目标是在遇到“是”-“否”对时在新列中放置一个“X”,然后继续填写“X”,直到出现“否”-“是”对。 这可能发生在几行或几百行上。
D 列显示了所需的输出。
C 列是当前失败的尝试。
任何人都可以帮忙吗? 提前致谢。
尝试这个:
df["E"] = np.nan
# Use boolean indexing to set no-yes to placeholder value
df.loc[(df["A"] == "no") & (df["B"] == "yes"), "E"] = "PL"
# Shift placeholder down by one, as it seems from your example
# that you want X to be on the no-yes "stopping" row
df["E"] = df.E.shift(1)
# Then set the X value on the yes-no rows
df.loc[(df.A == "yes") & (df.B == "no"), "E"] = "X"
df["E"] = df.E.ffill() # Fill forward
# Fix placeholders
df.loc[df.E == "PL", "E"] = np.nan
结果:
A B C D E
0 no no nan nan NaN
1 no no nan nan NaN
2 yes no X X X
3 yes no nan X X
4 no no nan X X
5 no yes nan X X
6 no yes nan nan NaN
7 yes no X X X
8 no no nan X X
9 yes no X X X
10 yes no nan X X
11 no no nan X X
12 no yes nan X X
13 no no nan nan NaN
您可以使用 apply() 来做到这一点,
df['C'] = df[['A','B']].apply(yourfunction, axis=1)
您的功能可以在哪里:
def yourfunction(cols):
col_A = cols[0]
col_B = cols[1]
if YOURLOGIC:
return X
你可以试试这个方法。 在这里,我使用iterrows
循环遍历行
import pandas as pd
import numpy as np
list1 = ['no','no','yes','yes','no','no','no','yes','no','yes','yes','no','no','no']
list2 = ['no','no','no','no','no','yes','yes','no','no','no','no','no','yes','no']
df = pd.DataFrame({'A' : list1, 'B' : list2}, columns = ['A', 'B'])
df['C'] = np.nan
to_check = 0
for ind, row in df.iterrows():
if (row['A'] == 'yes') and (row['B'] == 'no'):
to_check = 1
df.loc[ind, 'C'] = 'X'
continue
if (row['A'] == 'no') and (row['B'] == 'yes'):
if to_check == 1:
df.loc[ind, 'C'] = 'X'
to_check = 0
continue
if to_check == 1:
df.loc[ind, 'C'] = 'X'
df['D'] = 'nan','nan','X','X','X','X','nan','X','X','X','X','X','X','nan'
print (df)
这将完成工作,
def needed_in():
count = False
for index in df.index:
if df.loc[index, ["A", "B"]].tolist() == ["yes", "no"]:
count = True
if count:
yield index
if df.loc[index, ["A", "B"]].tolist() == ["no", "yes"]:
count = False
df["C"] = np.nan
df.loc[needed_in(), "C"] = "X"
输出 -
一个 | 乙 | C | |
---|---|---|---|
0 | 不 | 不 | 楠 |
1 | 不 | 不 | 楠 |
2 | 是的 | 不 | X |
3 | 是的 | 不 | X |
4 | 不 | 不 | X |
5 | 不 | 是的 | X |
6 | 不 | 是的 | 楠 |
7 | 是的 | 不 | X |
8 | 不 | 不 | X |
9 | 是的 | 不 | X |
10 | 是的 | 不 | X |
11 | 不 | 不 | X |
12 | 不 | 是的 | X |
13 | 不 | 不 | 楠 |
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.