简体   繁体   English

选择在一列中具有相同值而在另一列中具有不同值的行 Pandas Python

[英]Select rows with equal values in one column and different values in another column Pandas Python

I have a dataframe with about 400,000 rows but I want to correct some things in it.我有一个包含大约 400,000 行的数据框,但我想更正其中的一些内容。 So I want to select all the rows that have an equal value to the previous one in one column and a different value that the previous one in another column.因此,我想选择所有与一列中的前一列具有相同值且与另一列中的前一列具有不同值的行。 In other words, x[i] == x[i+1] AND x[j].= x[j+1].换句话说,x[i] == x[i+1] AND x[j].= x[j+1]。 So what I thought about doing is sorting the values using i and j then shifting the dataframe and grouping by.所以我想做的是使用 i 和 j 对值进行排序,然后移动数据框并分组。 However I am having problems getting out the dataset但是我在获取数据集时遇到问题

Example dataset:示例数据集:

 i         j           year       
"foo"     "jar"          5
"foo"     "jam"          5
"hi"      "hell"         6
"hi"      "hello"        6
"good"    "happy"        8
"bad"     "happy"        8
"happy"   "good"         8

Desidered output:期望的输出:

 i         j           year       
"foo"     "jar"          5
"foo"     "jam"          5
"hi"      "hell"         6
"hi"      "hello"        6

Current Code:当前代码:

shifteddf = df.shift()

df1 = df[df["i"]==shifteddf["i"]]

df2 = df[df["j"]!=shifteddf["j"]]

pd.merge(df1, df2, left_index=True, right_index=True)

Current output:当前输出:

 i         j           year       
"foo"     "jam"          5
"hi"      "hello"        6

So I am missing some rows, and I think I am currently doing a fatal error所以我遗漏了一些行,我想我现在犯了一个致命错误

This can be done based on identical strings and length of strings.这可以基于相同的字符串和字符串长度来完成。

Method 1方法一

Based on strings You need eq with shift:基于字符串你需要 eq 和 shift:

match = df.i.eq(df.i.shift())
print(df[match | match.shift(-1)])

Output #输出 #

     i      j           year         
0  foo    jar            5    
1  foo    jam            5    
2   hi   hell            6    
3   hi  hello            6    

Method 2方法二

If you want by length then simply calculate length first and then group accordingly如果你想按长度然后简单地先计算长度然后相应地分组

df['len'] = df['i'].astype(str).map(len)
match = (df.len.groupby(df.i).diff().eq(0))

print(df[match | match.shift(-1)])

which also outputs #这也输出#

     i      j          year  len
0  foo    jar            5    3
1  foo    jam            5    3
2   hi   hell            6    2
3   hi  hello            6    2

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 删除一列中具有相同值而另一列中具有不同值的行 - Delete rows with equal values in one column and different values in another column Pandas - 在另一列中具有相同值的行中的一列中查找重复条目 - Pandas - Find duplicated entries in one column within rows with equal values in another column Python Pandas - 过滤 pandas dataframe 以获取一列中具有最小值的行,以获取另一列中的每个唯一值 - Python Pandas - filter pandas dataframe to get rows with minimum values in one column for each unique value in another column 从 Pandas DataFrame 中选择一列中具有相同值但另一列中具有不同值的行 - Select rows from a Pandas DataFrame with same values in one column but different value in the other column Python / Pandas:如何在一列中选择值等于另一列中另一行的行? - Python/Pandas: How do I select rows in one column where value iis equal to a different row in a different column? 删除熊猫中所有具有同一列值而另一列具有不同值的行 - Dropping all rows in pandas having same values in one column and different values in another Python Pandas 函数根据另一列中的重复值将不同的值合并到一行中 - Python pandas function to concat into one row different values into one column based on repeating values in another Python Pandas:根据另一列的值选择一个列的多个单元格值 - Python Pandas: Select Multiple Cell Values of one column based on the Value of another Column Pandas dataframe - 选择一列的值包含字符串,另一列的值以特定字符串开头的行 - Pandas dataframe - Select rows where one column's values contains a string and another column's values starts with specific strings 使用 python 或 pandas 替换另一列不同方程中的一列值 - Replacing one column values in another column different equations using python or pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM