简体   繁体   中英

How to check if a value in one column can contain more than value in another column

I have the following dataframe:

df = pd.DataFrame()
df['id'] = [1, 1, 2, 2]
df['col1'] = [10, 10, 20, 20]
df['col2'] = [100, 200, 50, 50]
df['col3'] = [1, 2, 3, 4]

The goal

From this dataframe, I want to return the part of the dataframe where a value in col1 can have multiple values in col2 for a particular ID . In this case, id '1' has a value in col1 of 10, and 100 in col2. As id '1' also has a value of 10 in col1 in the second row, the value in col2 should also be 100. This is not the case for this id, however, it is the case for ID '2'. It should work both ways, so the values of col1 and col2 should just be consistent with each other for a ID. Column 3 contains other values that are not important for the matching, but should be included in the dataframe.

Desired output

The part of the dataframe where the values of the columns are not matching.

df = pd.DataFrame()
df['id'] = [1, 1]
df['col1'] = [10, 10]
df['col2'] = [100, 200]
df['col3'] = [1, 2]

You groupby and check the number of unique values for each value in col1 and if it is 1 you keep it:

df = df[(df.groupby(['id', 'col1'])['col2'].transform(lambda x: x.nunique()!=1))]
print(df)

id  col1  col2
2    20    50
2    20    50

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM