[英]Select rows with equal values in one column and different values in another column Pandas Python
I have a dataframe with about 400,000 rows but I want to correct some things in it.我有一个包含大约 400,000 行的数据框,但我想更正其中的一些内容。 So I want to select all the rows that have an equal value to the previous one in one column and a different value that the previous one in another column.
因此,我想选择所有与一列中的前一列具有相同值且与另一列中的前一列具有不同值的行。 In other words, x[i] == x[i+1] AND x[j].= x[j+1].
换句话说,x[i] == x[i+1] AND x[j].= x[j+1]。 So what I thought about doing is sorting the values using i and j then shifting the dataframe and grouping by.
所以我想做的是使用 i 和 j 对值进行排序,然后移动数据框并分组。 However I am having problems getting out the dataset
但是我在获取数据集时遇到问题
Example dataset:示例数据集:
i j year
"foo" "jar" 5
"foo" "jam" 5
"hi" "hell" 6
"hi" "hello" 6
"good" "happy" 8
"bad" "happy" 8
"happy" "good" 8
Desidered output:期望的输出:
i j year
"foo" "jar" 5
"foo" "jam" 5
"hi" "hell" 6
"hi" "hello" 6
Current Code:当前代码:
shifteddf = df.shift()
df1 = df[df["i"]==shifteddf["i"]]
df2 = df[df["j"]!=shifteddf["j"]]
pd.merge(df1, df2, left_index=True, right_index=True)
Current output:当前输出:
i j year
"foo" "jam" 5
"hi" "hello" 6
So I am missing some rows, and I think I am currently doing a fatal error所以我遗漏了一些行,我想我现在犯了一个致命错误
This can be done based on identical strings and length of strings.这可以基于相同的字符串和字符串长度来完成。
Based on strings You need eq with shift:基于字符串你需要 eq 和 shift:
match = df.i.eq(df.i.shift())
print(df[match | match.shift(-1)])
Output #输出 #
i j year
0 foo jar 5
1 foo jam 5
2 hi hell 6
3 hi hello 6
If you want by length then simply calculate length first and then group accordingly如果你想按长度然后简单地先计算长度然后相应地分组
df['len'] = df['i'].astype(str).map(len)
match = (df.len.groupby(df.i).diff().eq(0))
print(df[match | match.shift(-1)])
which also outputs #这也输出#
i j year len
0 foo jar 5 3
1 foo jam 5 3
2 hi hell 6 2
3 hi hello 6 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.