简体   繁体   English

如何检查一列的每个值是否映射到另一列中的一个值?

[英]How to check whether each value of one column maps to exactly one value in another column?

I have a dataframe like this我有一个像这样的 dataframe

import pandas as pd

df = pd.DataFrame({'A':list('bbcddee'), 'B': list('klmnnoi')})

   A  B
0  b  k
1  b  l
2  c  m
3  d  n
4  d  n
5  e  o
6  e  i

and I would like to create a dictionary from the columns A and B using eg我想使用例如从列AB创建一个字典

dict(zip(df.A, df.B))

Before doing this, I would like to check whether each value in A is mapped to only one value in B ;在此之前,我想检查A中的每个值是否仅映射到B中的一个值; if not, an error should be thrown;如果不是,则应抛出错误; above that is not the case as b is mapped to k and l and e is mapped to o and i .上述情况并非如此,因为b映射到kl并且e映射到oi

One way of approaching it would be:一种接近它的方法是:

df[df.groupby('A', sort=False)['B'].transform(lambda x: len(set(x))) > 1]

which returns返回

   A  B
0  b  k
1  b  l
5  e  o
6  e  i

However, that requires a lambda which might make it slow.但是,这需要lambda可能会使其变慢。 Does anyone see an option to speed it up?有没有人看到加速它的选项?

You can groupby with nunique to get how many unique values in 'B' belong to each unique value in 'A'.您可以使用groupby进行nunique ,以获取“B”中有多少唯一值属于“A”中的每个唯一值。

df.groupby('A').B.nunique()
#A
#b    2
#c    1
#d    1
#e    2
#Name: B, dtype: int64

And so you can check if any of them have more than 1 mapping:因此,您可以检查其中是否有超过 1 个映射:

df.groupby('A').B.nunique().gt(1).any()
#True

The above is conceptually no different from what you proposed.以上在概念上与您提出的没有什么不同。 However, there is often a major performance gain if you are able to use a built-in groupby operation, which has been "optimized", as opposed to a slow lambda that requires a loop.但是,如果您能够使用已“优化”的内置 groupby 操作,而不是需要循环的慢速 lambda,则通常会显着提高性能。 We can see that as the DataFrame gets large the lambda can become nearly 100x slower, which is a big deal when things are starting to take seconds to compute.我们可以看到,随着 DataFrame 变大,lambda 会变慢近 100 倍,这在计算开始需要几秒钟的时间是很重要的。

import perfplot
import pandas as pd
import numpy as np

def gb_lambda(df):
    return df.groupby('A')['B'].apply(lambda x: len(set(x))).gt(1)

def gb_nunique(df):
    return df.groupby('A').B.nunique().gt(1)

perfplot.show(
    setup=lambda n: pd.DataFrame({'A': np.random.randint(0, n//2, n), 
                                  'B': np.random.randint(0, n//2, n)}),
    kernels=[
        lambda df: gb_lambda(df),
        lambda df: gb_nunique(df),
    ],
    labels=['groupby with lambda', 'Groupby.nunique'],
    n_range=[2 ** k for k in range(2,18)],
    equality_check=np.allclose,  
    xlabel='~len(df)'
)

在此处输入图像描述

You can use pd.Series.duplicated and df.duplicated with keep parameter set to False您可以使用pd.Series.duplicateddf.duplicated并将keep参数设置为False

df[df.A.duplicated(keep=False) & (~df.duplicated(keep=False))]

   A  B
0  b  k
1  b  l
5  e  o
6  e  i

Details细节

df.A.duplicated(keep=False) # To eliminate `A` values occur only once

0     True
1     True
2    False # ----> `c` which has no duplicates 
3     True
4     True
5     True
6     True
Name: A, dtype: bool

~df.duplicated(keep=False) # Capture values having different mapping
0     True
1     True
2     True
3    False # ----> d n
4    False # ----> d n
5     True
6     True
dtype: bool

Let us try filter让我们尝试filter

df.groupby('A').filter(lambda x : x['B'].nunique()>1)
   A  B
0  b  k
1  b  l
5  e  o
6  e  i

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 检查一列中的值是否在另一列的列表中 - Check if a value in one column is in a list in another column 如何检查一列是否映射到特定值,反之亦然? - How to check if one column maps to specific value and vice verse? 如何检查数据框中的一列是否与另一个数据框中的一列完全相等 - How to check if one column in a dataframe is exactly equal to a column in another dataframe Python Pandas:检查一列中的值是否存在于另一列的行子集中 - Python Pandas: Check whether a value in one column is present in subsets of rows in another column 检查列中的每个值在另一列熊猫中只有一个对应的值 - Check for each value in column has only one corresponding value in another column pandas 检查一列值是否在另一列中并创建列以在 Pandas 中指示 - Check if one column value is in another column and create column to indicate in Pandas 用熊猫中另一列的每个值替换一个列的值 - Replace values of one column for each value of another column in pandas 检查一列中的每个值与一个数据框中另一列的每个值 - Check each value in one column with each value of other column in one dataframe 如何在不迭代每一列的情况下有条件地将 dataframe 的一列中的值替换为另一列的值? - How to conditionally replace the value from one column of a dataframe with the value of another without iterating each column? 如何检查一列中的值是否可以包含多于另一列中的值 - How to check if a value in one column can contain more than value in another column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM