选择在一列中具有相同值而在另一列中具有不同值的行 Pandas Python

Question

I have a dataframe with about 400,000 rows but I want to correct some things in it.我有一个包含大约 400,000 行的数据框，但我想更正其中的一些内容。 So I want to select all the rows that have an equal value to the previous one in one column and a different value that the previous one in another column.因此，我想选择所有与一列中的前一列具有相同值且与另一列中的前一列具有不同值的行。 In other words, x[i] == x[i+1] AND x[j].= x[j+1].换句话说，x[i] == x[i+1] AND x[j].= x[j+1]。 So what I thought about doing is sorting the values using i and j then shifting the dataframe and grouping by.所以我想做的是使用 i 和 j 对值进行排序，然后移动数据框并分组。 However I am having problems getting out the dataset但是我在获取数据集时遇到问题

Example dataset:示例数据集：

 i         j           year       
"foo"     "jar"          5
"foo"     "jam"          5
"hi"      "hell"         6
"hi"      "hello"        6
"good"    "happy"        8
"bad"     "happy"        8
"happy"   "good"         8

Desidered output:期望的输出：

 i         j           year       
"foo"     "jar"          5
"foo"     "jam"          5
"hi"      "hell"         6
"hi"      "hello"        6

Current Code:当前代码：

shifteddf = df.shift()

df1 = df[df["i"]==shifteddf["i"]]

df2 = df[df["j"]!=shifteddf["j"]]

pd.merge(df1, df2, left_index=True, right_index=True)

Current output:当前输出：

 i         j           year       
"foo"     "jam"          5
"hi"      "hello"        6

So I am missing some rows, and I think I am currently doing a fatal error所以我遗漏了一些行，我想我现在犯了一个致命错误

Answer 1

This can be done based on identical strings and length of strings.这可以基于相同的字符串和字符串长度来完成。

Method 1方法一

Based on strings You need eq with shift:基于字符串你需要 eq 和 shift：

match = df.i.eq(df.i.shift())
print(df[match | match.shift(-1)])

Output #输出＃

     i      j           year         
0  foo    jar            5    
1  foo    jam            5    
2   hi   hell            6    
3   hi  hello            6

Method 2方法二

If you want by length then simply calculate length first and then group accordingly如果你想按长度然后简单地先计算长度然后相应地分组

df['len'] = df['i'].astype(str).map(len)
match = (df.len.groupby(df.i).diff().eq(0))

print(df[match | match.shift(-1)])

which also outputs #这也输出#

     i      j          year  len
0  foo    jar            5    3
1  foo    jam            5    3
2   hi   hell            6    2
3   hi  hello            6    2

选择在一列中具有相同值而在另一列中具有不同值的行 Pandas Python

问题描述

1 个解决方案

解决方案1
0 2022-12-21 17:30:14

Method 1方法一

Method 2方法二

选择在一列中具有相同值而在另一列中具有不同值的行 Pandas Python

问题描述

1 个解决方案

解决方案1 0 2022-12-21 17:30:14

Method 1方法一

Method 2方法二

解决方案1
0 2022-12-21 17:30:14