简体   繁体   English

Python Pandas:如何排除任何值重复超过 n 次的行

[英]Python Pandas: How to exclude rows that have any value repeated more than n times

I'm pretty green when it comes to Python so my apologies if this is an obvious question.当谈到 Python 时,我很绿色,所以如果这是一个明显的问题,我很抱歉。

I have a dataframe that has 8 columns.我有一个有 8 列的 dataframe。 For each row the first four columns are single names.对于每一行,前四列是单个名称。 The following four columns are locations that each one of those names are associated with.以下四列是与这些名称中的每一个相关联的位置。 Here's an example.这是一个例子。

name1名称1 name2名称2 name3名称3 name4名称4 loc1位置1 loc2位置2 loc3位置 3 loc4 loc4
Joe Dave戴夫 Aaron亚伦 Alex亚历克斯 NYC纽约市 CHI ANN人工神经网络 FAL法尔
Erica埃里卡 Alana阿拉娜 Steve史蒂夫 Blake布莱克 JAX JAX MIA米娅 JAX JAX JAX JAX
Stacy史黛西 Tom汤姆 Nancy南希 Steph斯蒂芬 SAC SFR SFR DAL达尔 DAL达尔

All I want to do is take that dataframe and create a new one that shows all the same information but excludes any rows that have more than two of the same location in the last 4 columns.我想要做的就是采用 dataframe 并创建一个新的,显示所有相同的信息,但不包括在最后 4 列中具有两个以上相同位置的任何行。 and then the result needs to have the index reset.然后结果需要重置索引。 So the result of the example above would be:所以上面例子的结果是:

name1名称1 name2名称2 name3名称3 name4名称4 loc1位置1 loc2位置2 loc3位置 3 loc4 loc4
Joe Dave戴夫 Aaron亚伦 Alex亚历克斯 NYC纽约市 CHI ANN人工神经网络 FAL法尔
Stacy史黛西 Tom汤姆 Nancy南希 Steph斯蒂芬 SAC SFR SFR DAL达尔 DAL达尔

I was trying to make it work with a combination of apply, groupby and count but could not get it to work right.我试图让它与 apply、groupby 和 count 的组合一起工作,但无法让它正常工作。 I feel like there's a simple solution.我觉得有一个简单的解决方案。

Many thanks!非常感谢!

You can use nunique on each row:您可以在每一行上使用nunique

df[df[['loc1', 'loc2', 'loc3', 'loc4']].apply(lambda x: len(x) - x.nunique() < 2, axis=1)].copy().reset_index()

Try this:尝试这个:

df[df.filter(like = 'loc').nunique(axis = 1) > 2]

Output: Output:

   name1 name2  name3  name4 loc1 loc2 loc3 loc4
0    Joe  Dave  Aaron   Alex  NYC  CHI  ANN  FAL
2  Stacy   Tom  Nancy  Steph  SAC  SFR  DAL  DAL

Using filter with like = 'loc' to limit the dataframe columns to just the last four or the loc columns, then get the number of unique values with axis=1 using nunique , the create a boolean series and with boolean filtering we can select correct rows. Using filter with like = 'loc' to limit the dataframe columns to just the last four or the loc columns, then get the number of unique values with axis=1 using nunique , the create a boolean series and with boolean filtering we can select correct行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如果重复超过 n 次,则删除 Pandas dataframe 中的连续重复项 - Drop consecutive duplicates in Pandas dataframe if repeated more than n times 返回超过N列具有相同值的Pandas数据框行 - Return Pandas dataframe rows where more than N columns have the same value Python列表包含的数字是否超过N次? - Does a Python list contain any number more than N times? Python Pandas:意味着新列中每 n 行重复 n 次 - Python Pandas: Mean every n rows in a new column repeated n times 在 pandas dataframe 列中仅保留重复四次以上的行 - Keep only rows repeated more than four times in a pandas dataframe column 计算一个列的值连续变化n次以上的次数,连同pandas中的变化、分组依据和条件 - Count how many times a value of a column changes for more than n consecutive times, together with the changes, with group by, and condition in pandas 如何仅使用 numpy 查找重复次数超过 n 次的值? - how to find values repeated more than n number of times using only numpy? Pandas,复制名称重复N次的行 - Pandas, copy rows whose names are repeated N times 在Pandas中保留相同ID出现n次以上的行并转换为每个ID列表 - Keeping rows in Pandas where the same ID appears more than n times and convert to list per ID 熊猫nlargest返回n行以上 - pandas nlargest is returning more than n rows
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM