简体   繁体   English

使用 pandas 识别两列之间的关系

[英]Identify Relationship between two columns using pandas

I have two columns in a dataframe as follows, namely Letter and Number我在 dataframe 中有两列如下,即字母和数字

数据框

I want to do following我想做以下

  1. In the above table letter A is repeated two times in column "Letter" which I want to classify as "One to Many" in a new column.在上表中,字母 A 在“字母”列中重复了两次,我想在新列中将其归类为“一对多”。
  2. 15 is repeated two times in number column which i want to classify as "many to one". 15 在我想归类为“多对一”的数字列中重复两次。
  3. Letter B, C and Number 5, 6 occurred only one time in each column therefore should be classified as one to one.字母 B、C 和数字 5、6 在每列中仅出现一次,因此应归类为一对一。
  4. For other should be classified as many to many.对于其他应归类为多对多。

Expected output is shown below.预期的 output 如下所示。 输出

  1. I tried using groupby function by shifting the column name, it helped to identify item 1 and item 2 separately.我尝试通过移动列名来使用groupby function,它有助于分别识别项目 1 和项目 2。

I want to do it in single function, Please help.....我想在单个 function 中做,请帮助.....

You could write a function like this:你可以这样写一个 function :

import pandas as pd

letter = ['A', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'F', 'G']
number = [10,11,5,6,15,15,20,20,25,28]
data = {'letter': letter, 'number': number}    
df = pd.DataFrame(data)

def relationship(letter, number):
    number_of_letters = {}
    number_of_numbers = {}
    relationship = [] 

    for i in letter:
        if i in number_of_letters:
            number_of_letters[i] += 1
        else:
            number_of_letters[i] = 1    
    for i in number:
        if i in number_of_numbers:
            number_of_numbers[i] += 1
        else:
            number_of_numbers[i] = 1    
    for i in range(len(letter)):
        if number_of_letters[letter[i]] == 1 and number_of_numbers[number[i]] == 1:
            relationship.append('One to One')
        elif number_of_letters[letter[i]] > 1 and number_of_numbers[number[i]] == 1:
            relationship.append('One to Many')
        elif number_of_letters[letter[i]] == 1 and number_of_numbers[number[i]] > 1:
            relationship.append('Many to One') 
        elif number_of_letters[letter[i]] > 1 and number_of_numbers[number[i]] > 1:
            relationship.append('Many to Many') 

    return relationship 

df['relationship'] = relationship(letter, number)

This could be your solution这可能是您的解决方案


import pandas as pd

d1 = ['A','A','B','C','D','E','F','G','F','G']
d2 = [10,11,5,6,15,15,20,20,25,28]

df = pd.DataFrame(list(zip(d1,d2)), columns = ['col1', 'col2'])


df['one to one'] = (df.groupby('col2')['col1'].transform(lambda x:x.nunique()==1) & df.groupby('col1')['col2'].transform(lambda x:x.nunique()==1))


df['many to one'] = (df.groupby('col2')['col1'].transform(lambda x:x.nunique()>1) & df.groupby('col1')['col2'].transform(lambda x:x.nunique()==1))


df['one to many'] = (df.groupby('col1')['col2'].transform(lambda x:x.nunique()>1) & df.groupby('col2')['col1'].transform(lambda x:x.nunique()==1))



df['many to many'] = (df.groupby('col1')['col2'].transform(lambda x:x.nunique()>1) & df.groupby('col2')['col1'].transform(lambda x:x.nunique()>1))


import numpy as np

conditions = [
    (df['one to one'] == True), (df['one to many'] == True),(df['many to one'] == True),(df['many to many'] == True)]
choices = ['one to one', 'one to many', 'many to one','many to many']
df['relation'] = np.select(conditions, choices)


df.drop(['one to one', 'one to many', 'many to one','many to many'], axis = 1)


输出

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM