简体   繁体   English

根据其他列的条件评估熊猫列的值

[英]Assessing values ​to a pandas column with conditions depending on other columns

I have a dataframe:我有一个数据框:

df_test = pd.DataFrame({'col': ['paris', 'paris', 'nantes', 'berlin', 'berlin', 'berlin', 'tokyo'],
                        'id_res': [12, 12, 14, 28, 8, 4, 89]})


     col  id_res
0   paris      12
1   paris      12
2  nantes      14
3  berlin      28
4  berlin       8
5  berlin       4
6   tokyo      89

I want to create a "check" column whose values ​​are as follows:我想创建一个“检查”列,其值如下:

  • If a value in "col" has a duplicate and these duplicates have the same id_res, the value of "check" is False for duplicates如果 "col" 中的值有重复并且这些重复具有相同的 id_res,则 "check" 的值对于重复为 False
  • If a value in "col" has duplicates and the "id_res" of these duplicates are different, assign True in "check" for the largest "id_res" value and False for the smallest如果 "col" 中的值有重复,并且这些重复的 "id_res" 不同,则在 "check" 中为最大的 "id_res" 值分配 True, 为最小的值分配 False
  • If a value in "col" has no duplicates, the value of "check" is False.如果“col”中的值没有重复项,则“check”的值为 False。

The output I want is therefore:因此,我想要的输出是:

    col  id_res  check
0   paris      12  False
1   paris      12  False
2  nantes      14  False
3  berlin      28   True
4  berlin       8  False
5  berlin       4  False
6   tokyo      89  False

I tried with groupby but no satisfactory result.我尝试了 groupby 但没有令人满意的结果。 Can anyone help me plz任何人都可以帮助我吗?

Create 2 boolean masks then combine them and find the highest id_res value per col :创建 2 个布尔掩码,然后将它们组合并找到每个col的最高id_res值:

m1 = df['col'].duplicated(keep=False)
m2 = ~df['id_res'].duplicated(keep=False)
df['check'] = df.index.isin(df[m1 & m2].groupby('col')['id_res'].idxmax())
print(df)

# Output
      col  id_res  check
0   paris      12  False
1   paris      12  False
2  nantes      14  False
3  berlin      28   True
4  berlin       8  False
5  berlin       4  False
6   tokyo      89  False

Details:细节:

>>> pd.concat([df, m1.rename('m1'), m2.rename('m2')])
      col  id_res  check     m1     m2
0   paris      12  False   True  False
1   paris      12  False   True  False
2  nantes      14  False  False   True
3  berlin      28   True   True   True  # <-  group to check
4  berlin       8  False   True   True  # <-     because 
5  berlin       4  False   True   True  # <- m1 and m2 are True
6   tokyo      89  False  False   True

You basically have 3 conditions, so use masks and take the logical intersection (AND/ & ):你基本上有 3 个条件,所以使用掩码并取逻辑交集 (AND/ & ):

g = df_test.groupby('col')['id_res']

# is col duplicated?
m1 = df_test['col'].duplicated(keep=False)
# [ True  True False  True  True  True False]

# is id_res max of its group?
m2 = df_test['id_res'].eq(g.transform('max'))
# [ True  True  True  True False False  True]

# is group diverse? (more than 1 id_res)
m3 = g.transform('nunique').gt(1)
# [False False False  True  True  True False]

# check if all conditions True
df_test['check'] = m1&m2&m3

Output:输出:

      col  id_res  check
0   paris      12  False
1   paris      12  False
2  nantes      14  False
3  berlin      28   True
4  berlin       8  False
5  berlin       4  False
6   tokyo      89  False

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据其他列的条件获取列的值列表 - Getting a list of values of a column depending on the conditions of other columns 熊猫如何根据其他列中的值汇总一列的总和 - pandas how to aggregate sum on a column depending on values in other columns 根据其他两列的值,在 pandas 中创建一个新列 - Create a new column in pandas depending on values from two other columns 根据其他具有条件的列的值在 Pandas 中添加列 - Adding column in pandas based on values from other columns with conditions 根据 pandas 的其他列在列中添加随机值 - Adding random values in column depending on other columns with pandas 如果其他列中的值满足条件,则创建 pandas dataframe 列 - Create pandas dataframe columns if values in other column satisfy conditions 如果两列在三列中具有相同的值,则无法在 Pandas 中获得结果,然后保留第一个列值,否则其他值取决于条件 - failed to get result in pandas if two columns has same values out of three then keep first col values, else other values depending on conditions 使用 Pandas 根据其他三列中类别级别值的条件填充第四列 - Using Pandas fill a fourth column based on conditions on category level values in three other columns 根据其他列值在 pandas 中添加新列 - add new columns in pandas depending on other columns values 筛选以多列为条件的数据框,根据列值的不同条件 - Filtering dataframes conditioned on multiple columns, with different conditions depending on column values
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM