[英]Identify Relationship between two columns using pandas
I have two columns in a dataframe as follows, namely Letter and Number我在 dataframe 中有两列如下,即字母和数字
I want to do following我想做以下
Expected output is shown below.预期的 output 如下所示。
groupby
function by shifting the column name, it helped to identify item 1 and item 2 separately.我尝试通过移动列名来使用groupby
function,它有助于分别识别项目 1 和项目 2。I want to do it in single function, Please help.....我想在单个 function 中做,请帮助.....
You could write a function like this:你可以这样写一个 function :
import pandas as pd
letter = ['A', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'F', 'G']
number = [10,11,5,6,15,15,20,20,25,28]
data = {'letter': letter, 'number': number}
df = pd.DataFrame(data)
def relationship(letter, number):
number_of_letters = {}
number_of_numbers = {}
relationship = []
for i in letter:
if i in number_of_letters:
number_of_letters[i] += 1
else:
number_of_letters[i] = 1
for i in number:
if i in number_of_numbers:
number_of_numbers[i] += 1
else:
number_of_numbers[i] = 1
for i in range(len(letter)):
if number_of_letters[letter[i]] == 1 and number_of_numbers[number[i]] == 1:
relationship.append('One to One')
elif number_of_letters[letter[i]] > 1 and number_of_numbers[number[i]] == 1:
relationship.append('One to Many')
elif number_of_letters[letter[i]] == 1 and number_of_numbers[number[i]] > 1:
relationship.append('Many to One')
elif number_of_letters[letter[i]] > 1 and number_of_numbers[number[i]] > 1:
relationship.append('Many to Many')
return relationship
df['relationship'] = relationship(letter, number)
This could be your solution这可能是您的解决方案
import pandas as pd
d1 = ['A','A','B','C','D','E','F','G','F','G']
d2 = [10,11,5,6,15,15,20,20,25,28]
df = pd.DataFrame(list(zip(d1,d2)), columns = ['col1', 'col2'])
df['one to one'] = (df.groupby('col2')['col1'].transform(lambda x:x.nunique()==1) & df.groupby('col1')['col2'].transform(lambda x:x.nunique()==1))
df['many to one'] = (df.groupby('col2')['col1'].transform(lambda x:x.nunique()>1) & df.groupby('col1')['col2'].transform(lambda x:x.nunique()==1))
df['one to many'] = (df.groupby('col1')['col2'].transform(lambda x:x.nunique()>1) & df.groupby('col2')['col1'].transform(lambda x:x.nunique()==1))
df['many to many'] = (df.groupby('col1')['col2'].transform(lambda x:x.nunique()>1) & df.groupby('col2')['col1'].transform(lambda x:x.nunique()>1))
import numpy as np
conditions = [
(df['one to one'] == True), (df['one to many'] == True),(df['many to one'] == True),(df['many to many'] == True)]
choices = ['one to one', 'one to many', 'many to one','many to many']
df['relation'] = np.select(conditions, choices)
df.drop(['one to one', 'one to many', 'many to one','many to many'], axis = 1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.