選擇在一列中具有相同值而在另一列中具有不同值的行 Pandas Python

Question

我有一個包含大約 400,000 行的數據框，但我想更正其中的一些內容。 因此，我想選擇所有與一列中的前一列具有相同值且與另一列中的前一列具有不同值的行。 換句話說，x[i] == x[i+1] AND x[j].= x[j+1]。 所以我想做的是使用 i 和 j 對值進行排序，然后移動數據框並分組。 但是我在獲取數據集時遇到問題

示例數據集：

 i         j           year       
"foo"     "jar"          5
"foo"     "jam"          5
"hi"      "hell"         6
"hi"      "hello"        6
"good"    "happy"        8
"bad"     "happy"        8
"happy"   "good"         8

期望的輸出：

 i         j           year       
"foo"     "jar"          5
"foo"     "jam"          5
"hi"      "hell"         6
"hi"      "hello"        6

當前代碼：

shifteddf = df.shift()

df1 = df[df["i"]==shifteddf["i"]]

df2 = df[df["j"]!=shifteddf["j"]]

pd.merge(df1, df2, left_index=True, right_index=True)

當前輸出：

 i         j           year       
"foo"     "jam"          5
"hi"      "hello"        6

所以我遺漏了一些行，我想我現在犯了一個致命錯誤

Answer 1

這可以基於相同的字符串和字符串長度來完成。

方法一

基於字符串你需要 eq 和 shift：

match = df.i.eq(df.i.shift())
print(df[match | match.shift(-1)])

輸出＃

     i      j           year         
0  foo    jar            5    
1  foo    jam            5    
2   hi   hell            6    
3   hi  hello            6

方法二

如果你想按長度然后簡單地先計算長度然后相應地分組

df['len'] = df['i'].astype(str).map(len)
match = (df.len.groupby(df.i).diff().eq(0))

print(df[match | match.shift(-1)])

這也輸出#

     i      j          year  len
0  foo    jar            5    3
1  foo    jam            5    3
2   hi   hell            6    2
3   hi  hello            6    2

選擇在一列中具有相同值而在另一列中具有不同值的行 Pandas Python

問題描述

1 個解決方案

解決方案1
0 2022-12-21 17:30:14

方法一

方法二

選擇在一列中具有相同值而在另一列中具有不同值的行 Pandas Python

問題描述

1 個解決方案

解決方案1 0 2022-12-21 17:30:14

方法一

方法二

解決方案1
0 2022-12-21 17:30:14