比较 Dataframe 两列的相同和不同

Question

I have a small dataframe, like this.我有一个小的dataframe，像这样。

import pandas as pd
import numpy as np
 
# data's stored in dictionary
details = {
    'address_id': [1, 1, 1, 2, 2],
    'business': ['verizon', 'verizon', 'comcast', 'sprint', 'att']
}
 
df = pd.DataFrame(details)
 

print(df)

I am trying to find out if, and when a person switched to a different cell phone service.我试图找出一个人是否以及何时切换到不同的手机服务。

I tried this logic;我试过这个逻辑； didn't work.没用。

df['new'] = df.Column1.isin(df.Column1) & df[~df.Column2.isin(df.Column2)]

Basically, given index row 0 and row 1, when the address_id was the same, the business was the same, but the business changed from verizon to comcast in index row 2. Also, given index row 3 and row 4, the address_id was the same, but the business changed from sprint to att in index row 4. I'd like to add a new column to the dataframe to flag these changes.基本上，给定索引第0行和第1行，当address_id相同时，业务相同，但是在索引第2行中业务从verizon变为comcast。另外，给定索引第3行和第4行，address_id是相同，但是在索引第 4 行中业务从 sprint 更改为 att。我想在 dataframe 中添加一个新列来标记这些更改。 How can I do that?我怎样才能做到这一点？

Answer 1

UPDATE : Here is an even simpler way than my original answer using join() (see below) to do what your question asks:更新：这是一种比我使用join() （见下文）的原始答案更简单的方法来完成您的问题：

df['new'] = df.address_id.map(df.groupby('address_id').first().business) != df.business

Explanation:解释：

Use groupby() and first() to create a dataframe whose business column contains the first one encountered for each address_id使用groupby()和first()创建一个 dataframe ，其business列包含每个address_id遇到的第一个
Use Series.map() to transform the original dataframe's address_id column into this first business value使用Series.map()将原始数据框的address_id列转换为第一个business值
Add column new which is True only if this new business differs from the original business column. True当此新business与原始business列不同时，才添加new列。

Here is a simple way to do what you've asked using groupby() and join() :这是使用groupby()和join()完成您所要求的简单方法：

df = df.join(df.groupby('address_id').first(), on='address_id', rsuffix='_first')
df = df.assign(new=df.business != df.business_first).drop(columns='business_first')f

Output: Output：

   address_id business    new
0           1  verizon  False
1           1  verizon  False
2           1  comcast   True
3           2   sprint  False
4           2      att   True

Explanation:解释：

Use groupby() and first() to create a dataframe whose business column contains the first one encountered for each address_id使用groupby()和first()创建一个 dataframe ，其business列包含每个address_id遇到的第一个
Use join() to add a column business_first to df containing the corresponding first business for each address_id使用join()将列business_first添加到包含每个address_id对应的第一个业务的df
Use assign() to add a column new containing a boolean indicating whether the row contains a new business with an existing address_id使用assign()添加包含 boolean 的new列，指示该行是否包含具有现有address_id的新business
Use drop() to eliminate the business_first column.使用drop()删除business_first列。

Answer 2

First, groupby on address_id .首先，在groupby上进行address_id 。

groups = df.groupby("address_id")

Then, iterate over the groups, and find where the value of business changes:然后，遍历这些组，找出business价值发生变化的地方：

for address_id, grp_data in groups:
    changed = grp_data['business'].ne(grp_data['business'].shift().bfill())
   
    df.loc[grp_data.index, "changed"] = changed

.shift().bfill() shifts all data one index over ( 0 -> 1 , 1 -> 2 , and so on), and then backfills the first value. .shift().bfill()将所有数据移动一个索引（ 0 -> 1 、 1 -> 2等），然后回填第一个值。 For example:例如：

>>> df["business"]
0    verizon
1    verizon
2    comcast
3     sprint
4        att
Name: business, dtype: object

>>> df["business"].shift()
0        NaN
1    verizon
2    verizon
3    comcast
4     sprint
Name: business, dtype: object

>>> df["business"].shift().bfill()
0    verizon
1    verizon
2    verizon
3    comcast
4     sprint
Name: business, dtype: object

Running the loop makes the following dataframe:运行循环会生成以下 dataframe：

   address_id business changed
0           1  verizon   False
1           1  verizon   False
2           1  comcast    True
3           2   sprint   False
4           2      att    True

比较 Dataframe 两列的相同和不同

问题描述

2 个解决方案

解决方案1
1 已采纳 2022-08-29 22:29:04

解决方案2
0 2022-08-29 22:15:39

比较 Dataframe 两列的相同和不同

问题描述

2 个解决方案

解决方案1 1 已采纳 2022-08-29 22:29:04

解决方案2 0 2022-08-29 22:15:39

解决方案1
1 已采纳 2022-08-29 22:29:04

解决方案2
0 2022-08-29 22:15:39