简体   繁体   English

从数据框中选择具有多列值的唯一组合的行

[英]Select rows from dataframe with unique combination of values from multiple columns

I have a data.frame in R that is a catalog of results from baseball games for every team for a number of seasons. 我在R中有一个data.frame,它是每个赛季多个球队棒球比赛结果的目录。 Some of the columns are team , opponent_team , date , result , team_runs , opponent_runs , etc. My problem is that the because the data.frame is a combination of logs for every team, each row essentially has another row somewhere else in the data.frame that is a mirror image of that row. 一些列的是teamopponent_teamdateresultteam_runsopponent_runs ,等我的问题是,因为data.frame是每一个团队日志的组合,每一行基本上是在该数据的另一列在其他地方。该行是该行的镜像的框架。

For example 例如

team  opponent_team  date           result team_runs opponent_runs
BAL   BOS            2010-04-05      W      5         4

has another row somewhere else that is 在其他地方有另一行

team  opponent_team  date           result team_runs opponent_runs
BOS   BAL            2010-04-05      L      4         5

I would like to write some code in dplyr or something similar that selects rows that have a unique combination of the team , opponent_team and date columns. 我想编写一些代码在dplyr或类似的东西,其选择有一个独特的组合teamopponent_teamdate列。 I stress the word combination here because order doesn't matter, I am just trying to get rid of the rows that are mirror images. 我在这里强调组合词,因为顺序无关紧要,我只是想摆脱那些镜像的行。

Thanks 谢谢

Have you tried distinct function from dplyr? 您是否尝试过与dplyr distinct功能? For your case, it can be something like 对于您的情况,可能是类似

library(dplyr)
df %>% distinct(team, opponent_team, date)

Another alternative is to use duplicated function from base R inside filter function of dplyr like below. 另一个选择是在dplyr的filter函数中使用来自R的duplicated函数,如下所示。

filter(!duplicated(team, opponent_team, date)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM