简体   繁体   English

在 Python dataframe 中有什么方法可以查看两列是否相同但具有重命名的值?

[英]Is there any way in a Python dataframe to see if two columns are the same but with renamed values?

For example if I have a large dataframe of all individuals in a zoo and two columns are Animal_Common_Name and Animal_Scientific_Name.例如,如果我在动物园中有一个很大的 dataframe 所有个体,并且两列是 Animal_Common_Name 和 Animal_Scientific_Name。 I suspect one of those is redundant as one characteristic is totally determined by the other and viceversa.我怀疑其中一个是多余的,因为一个特征完全由另一个特征决定,反之亦然。 Basically are the same charasteristic but renamed.基本上是相同的特征,但已更名。

Is there any fuction that selected two different columns tell you so?选择两个不同的列是否有任何功能告诉您?

You can use the pandas.Series.equals() method.您可以使用pandas.Series.equals()方法。

For example:例如:

import pandas as pd

data = {
    'Column1': [1, 2, 3, 4],
    'Column2': [1, 2, 3, 4],
    'Column3': [5, 6, 7, 8]
}

df = pd.DataFrame(data)

# True
print(df['Column1'].equals(df['Column2']))

# False
print(df['Column1'].equals(df['Column3']))

Found via GeeksForGeeks通过GeeksForGeeks找到

df['Animal_Common_Name'].equals(df['Animal_Scientific_Name'])

This should return True if they're the same and false if not.如果它们相同则返回 True,否则返回 false。

You can use the vectorized operations of pandas to quickly determine your redundancies.您可以使用 pandas 的向量化运算来快速确定您的冗余。 Here's an example:这是一个例子:

import pandas as pd

# create a sample dataframe from some data
d = {'name1': ['Zebra', 'Lion', 'Seagull', 'Spider'],
     'name2': ['Zebra', 'Lion', 'Bird', 'Insect']}
df = pd.DataFrame(data=d)

# create a new column for your test:
df['is_redundant'] = ''

# select your empty column where the redundancy exists:
df['is_redundant'][df['name1']==df['name2']] = 1

print(df)


    name1   name2   is_redundant
0   Zebra   Zebra   1
1   Lion    Lion    1
2   Seagull Bird    
3   Spider  Insect  

You can then replace the empties with 0 or leave as is depending on your application.然后,您可以根据您的应用将空容器替换为 0 或保留原样。

Assuming this example:假设这个例子:

  Animal_Common_Name  Animal_Scientific_Name
0               Lion            Panthera leo
1            Giraffe  Giraffa camelopardalis
2               Lion            Panthera leo

Use factorize to convert to a categorical integer, then compare is all values are equal:使用factorize转换为分类 integer,然后比较所有值是否相等:

(pd.factorize(df['Animal_Common_Name'])[0] == pd.factorize(df['Animal_Scientific_Name'])[0]).all()

Output: True Output: True

If you want to identify multiple relationships:如果要识别多个关系:

df[df.groupby('Animal_Scientific_Name')['Animal_Common_Name'].transform('nunique').ne(1)]

And the same with the column names swapped.与交换的列名相同。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 比较两个数据框列以查看它们是否具有相同的值 - Comparing two dataframe columns to see if they have the same values 有没有办法在python中显示dataframe的所有列的不同值? - Is there any way to show the different values of all the columns of a dataframe in python? 有没有一种方法可以检查两个列表是否在Python中包含相同的值? - Is there a way to check if two lists contain any of the same values in Python? 如果两列相同,则从 dataframe 更改值 - Change values from dataframe if two columns are the same 通过两列中的相同值对 dataframe 进行子集化 - Subsetting dataframe by same values in two columns 过滤数据框中具有相同值的列 - Python - filtering columns with same values in a dataframe - Python 有没有更好的方法来聚合同一分组 pandas dataframe 上的多个列? - Is there any nicer way to aggregate multiple columns on same grouped pandas dataframe? 熊猫数据框:有什么方法可以将列转换为熊猫中的行值 - pandas dataframe: is there any way to transform columns as row values in pandas 将函数应用于数据框中的列并返回两个新列的值Python - Applying a function on to columns in dataframe and returning to values of two new columns Python python append 两个值到同一行的两列如果新值 - python append two values to two columns same row if new values
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM