[英]Python Pandas: How to exclude rows that have any value repeated more than n times
I'm pretty green when it comes to Python so my apologies if this is an obvious question.当谈到 Python 时,我很绿色,所以如果这是一个明显的问题,我很抱歉。
I have a dataframe that has 8 columns.我有一个有 8 列的 dataframe。 For each row the first four columns are single names.
对于每一行,前四列是单个名称。 The following four columns are locations that each one of those names are associated with.
以下四列是与这些名称中的每一个相关联的位置。 Here's an example.
这是一个例子。
name1![]() |
name2![]() |
name3![]() |
name4![]() |
loc1![]() |
loc2![]() |
loc3![]() |
loc4 ![]() |
---|---|---|---|---|---|---|---|
Joe![]() |
Dave![]() |
Aaron![]() |
Alex![]() |
NYC![]() |
CHI![]() |
ANN![]() |
FAL![]() |
Erica![]() |
Alana![]() |
Steve![]() |
Blake![]() |
JAX ![]() |
MIA![]() |
JAX ![]() |
JAX ![]() |
Stacy![]() |
Tom![]() |
Nancy![]() |
Steph![]() |
SAC![]() |
SFR ![]() |
DAL![]() |
DAL![]() |
All I want to do is take that dataframe and create a new one that shows all the same information but excludes any rows that have more than two of the same location in the last 4 columns.我想要做的就是采用 dataframe 并创建一个新的,显示所有相同的信息,但不包括在最后 4 列中具有两个以上相同位置的任何行。 and then the result needs to have the index reset.
然后结果需要重置索引。 So the result of the example above would be:
所以上面例子的结果是:
name1![]() |
name2![]() |
name3![]() |
name4![]() |
loc1![]() |
loc2![]() |
loc3![]() |
loc4 ![]() |
---|---|---|---|---|---|---|---|
Joe![]() |
Dave![]() |
Aaron![]() |
Alex![]() |
NYC![]() |
CHI![]() |
ANN![]() |
FAL![]() |
Stacy![]() |
Tom![]() |
Nancy![]() |
Steph![]() |
SAC![]() |
SFR ![]() |
DAL![]() |
DAL![]() |
I was trying to make it work with a combination of apply, groupby and count but could not get it to work right.我试图让它与 apply、groupby 和 count 的组合一起工作,但无法让它正常工作。 I feel like there's a simple solution.
我觉得有一个简单的解决方案。
Many thanks!非常感谢!
You can use nunique
on each row:您可以在每一行上使用
nunique
:
df[df[['loc1', 'loc2', 'loc3', 'loc4']].apply(lambda x: len(x) - x.nunique() < 2, axis=1)].copy().reset_index()
Try this:尝试这个:
df[df.filter(like = 'loc').nunique(axis = 1) > 2]
Output: Output:
name1 name2 name3 name4 loc1 loc2 loc3 loc4
0 Joe Dave Aaron Alex NYC CHI ANN FAL
2 Stacy Tom Nancy Steph SAC SFR DAL DAL
Using filter
with like = 'loc'
to limit the dataframe columns to just the last four or the loc columns, then get the number of unique values with axis=1
using nunique
, the create a boolean series and with boolean filtering we can select correct rows. Using
filter
with like = 'loc'
to limit the dataframe columns to just the last four or the loc columns, then get the number of unique values with axis=1
using nunique
, the create a boolean series and with boolean filtering we can select correct行。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.