[英]Is there any way in a Python dataframe to see if two columns are the same but with renamed values?
For example if I have a large dataframe of all individuals in a zoo and two columns are Animal_Common_Name and Animal_Scientific_Name.例如,如果我在动物园中有一个很大的 dataframe 所有个体,并且两列是 Animal_Common_Name 和 Animal_Scientific_Name。 I suspect one of those is redundant as one characteristic is totally determined by the other and viceversa.我怀疑其中一个是多余的,因为一个特征完全由另一个特征决定,反之亦然。 Basically are the same charasteristic but renamed.基本上是相同的特征,但已更名。
Is there any fuction that selected two different columns tell you so?选择两个不同的列是否有任何功能告诉您?
You can use the pandas.Series.equals()
method.您可以使用pandas.Series.equals()
方法。
For example:例如:
import pandas as pd
data = {
'Column1': [1, 2, 3, 4],
'Column2': [1, 2, 3, 4],
'Column3': [5, 6, 7, 8]
}
df = pd.DataFrame(data)
# True
print(df['Column1'].equals(df['Column2']))
# False
print(df['Column1'].equals(df['Column3']))
Found via GeeksForGeeks通过GeeksForGeeks找到
df['Animal_Common_Name'].equals(df['Animal_Scientific_Name'])
This should return True if they're the same and false if not.如果它们相同则返回 True,否则返回 false。
You can use the vectorized operations of pandas to quickly determine your redundancies.您可以使用 pandas 的向量化运算来快速确定您的冗余。 Here's an example:这是一个例子:
import pandas as pd
# create a sample dataframe from some data
d = {'name1': ['Zebra', 'Lion', 'Seagull', 'Spider'],
'name2': ['Zebra', 'Lion', 'Bird', 'Insect']}
df = pd.DataFrame(data=d)
# create a new column for your test:
df['is_redundant'] = ''
# select your empty column where the redundancy exists:
df['is_redundant'][df['name1']==df['name2']] = 1
print(df)
name1 name2 is_redundant
0 Zebra Zebra 1
1 Lion Lion 1
2 Seagull Bird
3 Spider Insect
You can then replace the empties with 0 or leave as is depending on your application.然后,您可以根据您的应用将空容器替换为 0 或保留原样。
Assuming this example:假设这个例子:
Animal_Common_Name Animal_Scientific_Name
0 Lion Panthera leo
1 Giraffe Giraffa camelopardalis
2 Lion Panthera leo
Use factorize
to convert to a categorical integer, then compare is all values are equal:使用factorize
转换为分类 integer,然后比较所有值是否相等:
(pd.factorize(df['Animal_Common_Name'])[0] == pd.factorize(df['Animal_Scientific_Name'])[0]).all()
Output: True
Output: True
If you want to identify multiple relationships:如果要识别多个关系:
df[df.groupby('Animal_Scientific_Name')['Animal_Common_Name'].transform('nunique').ne(1)]
And the same with the column names swapped.与交换的列名相同。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.