Python Pandas：如何排除任何值重复超过 n 次的行

Question

I'm pretty green when it comes to Python so my apologies if this is an obvious question.当谈到 Python 时，我很绿色，所以如果这是一个明显的问题，我很抱歉。

I have a dataframe that has 8 columns.我有一个有 8 列的 dataframe。 For each row the first four columns are single names.对于每一行，前四列是单个名称。 The following four columns are locations that each one of those names are associated with.以下四列是与这些名称中的每一个相关联的位置。 Here's an example.这是一个例子。

name1名称1	name2名称2	name3名称3	name4名称4	loc1位置1	loc2位置2	loc3位置 3	loc4 loc4
Joe乔	Dave戴夫	Aaron亚伦	Alex亚历克斯	NYC纽约市	CHI气	ANN人工神经网络	FAL法尔
Erica埃里卡	Alana阿拉娜	Steve史蒂夫	Blake布莱克	JAX JAX	MIA米娅	JAX JAX	JAX JAX
Stacy史黛西	Tom汤姆	Nancy南希	Steph斯蒂芬	SAC囊	SFR SFR	DAL达尔	DAL达尔

All I want to do is take that dataframe and create a new one that shows all the same information but excludes any rows that have more than two of the same location in the last 4 columns.我想要做的就是采用 dataframe 并创建一个新的，显示所有相同的信息，但不包括在最后 4 列中具有两个以上相同位置的任何行。 and then the result needs to have the index reset.然后结果需要重置索引。 So the result of the example above would be:所以上面例子的结果是：

name1名称1	name2名称2	name3名称3	name4名称4	loc1位置1	loc2位置2	loc3位置 3	loc4 loc4
Joe乔	Dave戴夫	Aaron亚伦	Alex亚历克斯	NYC纽约市	CHI气	ANN人工神经网络	FAL法尔
Stacy史黛西	Tom汤姆	Nancy南希	Steph斯蒂芬	SAC囊	SFR SFR	DAL达尔	DAL达尔

I was trying to make it work with a combination of apply, groupby and count but could not get it to work right.我试图让它与 apply、groupby 和 count 的组合一起工作，但无法让它正常工作。 I feel like there's a simple solution.我觉得有一个简单的解决方案。

Many thanks!非常感谢！

Answer 1

You can use nunique on each row:您可以在每一行上使用nunique ：

df[df[['loc1', 'loc2', 'loc3', 'loc4']].apply(lambda x: len(x) - x.nunique() < 2, axis=1)].copy().reset_index()

Answer 2

Try this:尝试这个：

df[df.filter(like = 'loc').nunique(axis = 1) > 2]

Output: Output：

   name1 name2  name3  name4 loc1 loc2 loc3 loc4
0    Joe  Dave  Aaron   Alex  NYC  CHI  ANN  FAL
2  Stacy   Tom  Nancy  Steph  SAC  SFR  DAL  DAL

Using filter with like = 'loc' to limit the dataframe columns to just the last four or the loc columns, then get the number of unique values with axis=1 using nunique , the create a boolean series and with boolean filtering we can select correct rows. Using filter with like = 'loc' to limit the dataframe columns to just the last four or the loc columns, then get the number of unique values with axis=1 using nunique , the create a boolean series and with boolean filtering we can select correct行。

Python Pandas：如何排除任何值重复超过 n 次的行

问题描述

2 个解决方案

解决方案1
0 2021-01-31 22:15:05

解决方案2
0 已采纳 2021-01-31 22:31:47

Python Pandas：如何排除任何值重复超过 n 次的行

问题描述

2 个解决方案

解决方案1 0 2021-01-31 22:15:05

解决方案2 0 已采纳 2021-01-31 22:31:47

解决方案1
0 2021-01-31 22:15:05

解决方案2
0 已采纳 2021-01-31 22:31:47