[英]Select rows from dataframe with unique combination of values from multiple columns
I have a data.frame in R that is a catalog of results from baseball games for every team for a number of seasons. 我在R中有一个data.frame,它是每个赛季多个球队棒球比赛结果的目录。 Some of the columns are
team
, opponent_team
, date
, result
, team_runs
, opponent_runs
, etc. My problem is that the because the data.frame is a combination of logs for every team, each row essentially has another row somewhere else in the data.frame that is a mirror image of that row. 一些列的是
team
, opponent_team
, date
, result
, team_runs
, opponent_runs
,等我的问题是,因为data.frame是每一个团队日志的组合,每一行基本上是在该数据的另一列在其他地方。该行是该行的镜像的框架。
For example 例如
team opponent_team date result team_runs opponent_runs
BAL BOS 2010-04-05 W 5 4
has another row somewhere else that is 在其他地方有另一行
team opponent_team date result team_runs opponent_runs
BOS BAL 2010-04-05 L 4 5
I would like to write some code in dplyr
or something similar that selects rows that have a unique combination of the team
, opponent_team
and date
columns. 我想编写一些代码在
dplyr
或类似的东西,其选择有一个独特的组合行team
, opponent_team
和date
列。 I stress the word combination here because order doesn't matter, I am just trying to get rid of the rows that are mirror images. 我在这里强调组合词,因为顺序无关紧要,我只是想摆脱那些镜像的行。
Thanks 谢谢
Have you tried distinct
function from dplyr? 您是否尝试过与dplyr
distinct
功能? For your case, it can be something like 对于您的情况,可能是类似
library(dplyr)
df %>% distinct(team, opponent_team, date)
Another alternative is to use duplicated
function from base R inside filter
function of dplyr like below. 另一个选择是在dplyr的
filter
函数中使用来自R的duplicated
函数,如下所示。
filter(!duplicated(team, opponent_team, date)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.