简体   繁体   English

如何从多个(3)数据帧中提取行,并在其中一列中有匹配的数字的情况下将它们输出到第四位?

[英]How do I extract rows from multiple (3) data frames and output them in a fourth if they have a matching numeric in one of the columns?

I'm trying to extract rows from three data frames batters_16, batters_17, and batters_18 -- which look like below 我正在尝试从三个数据帧batters_16,batters_17和batters_18中提取行-如下所示

 player_id       player_name launch_speed launch_angle
1    443558       Nelson Cruz         94.4         11.1
2    519317 Giancarlo Stanton         93.8         14.0
3    408234    Miguel Cabrera         93.6         12.3
4    452095     Tyler Flowers         93.2         12.9
5    407812     Matt Holliday         93.0          8.3
6    120074       David Ortiz         92.8         16.6

and I want to sort them into separate data frames depending on whether or not their player_id shows up in all 3 years (frames), in exactly two of the frames (batters_18 and batters_16 but not batters_17), and a final one if they only appear in one of the three frames. 我想根据它们的player_id是否在所有3年(帧)中都出现,分别在两个帧(batters_18和batters_16而不是batters_17)中显示,最后一个(如果它们仅出现)将它们分类为单独的数据帧在三个框架之一中。 Which should give me 7 total data frames. 这应该给我总共7个数据帧。 How can I get this done? 我该怎么做? I've written a function that tries to separate them using %in% and then runs a calculation but have just had no luck getting it to work -- the output is just 3 columns with almost random numbers and I'll regularly get errors like the one below. 我编写了一个函数,尝试使用%in%分隔它们,然后运行计算,但是运气不好,它只有3列,几乎都是随机数,我经常会出现类似下面的一个。

Warning message:
In if (playerid %in% b18$player_id == FALSE & playerid %in% 
b17$player_id ==  : the condition has length > 1 and only the first 
element will be used

This is the function I wrote for reference. 这是我编写供参考的功能。

# to combine batting stats from the 3 seasons in the appropriate categories
# but with a weighting of 45% in 2018, 35% in 2017, and 20% in 2016 for sake
# of favoring recent form and performance, but in each seasons all players have
# at least 50 events

 combine.batting.stats <- function(b18, b17, b16, playerID_map){

  #using the stats for each year along with the player ID map

  b18 = read.csv("~/HITS/batters_18.csv")
  b17 = read.csv("~/HITS/batters_17.csv")
  b16 = read.csv("~/HITS/batters_17.csv")
  playerID_map = read.csv("~/HITS/playerID_map.csv")
  playerid = playerID_map$MLBID
  average_launch_speed = 0
  average_launch_angle = 0

  # so first my weights with the scenarios being 
  # exists in all 3 years, exits in exactly two, and finally exists exactly one



  # the check for whether something is in a data frame is as below
  # SOMETHING %in% DATAFRAME$COLUMN
  # this should be used to code three different scenarios where I weight 
  # the value of season stats depending on how may seasons they qualify in

  if(playerid %in% b18$player_id == TRUE & playerid %in% b17$player_id == TRUE
     & playerid %in% b16$player_id == TRUE) {

    #calculation for case of 3 year player
    # 18 is 45%, 17 is 35%, and 16 is 20%

    average_launch_speed = (((b18$launch_speed * 0.45) + (b17$launch_speed * 0.35)
                             + (b16$launch_speed * 0.2)) / 3)

    average_launch_angle = (((b18$launch_angle * 0.45) + (b17$launch_angle * 0.35)
                             + (b16$launch_angle * 0.2)) / 3)

  }

  if(playerid %in% b18$player_id == TRUE & playerid %in% b17$player_id == TRUE
     & playerid %in% b16$player_id == FALSE) {

    #calculation for player in b18 and b17 but not b16....should be extended to
    #other 2 year player situations that is b17 and b16 but not b18 as well as
    #b18 and b16 but not b17 (which I would like to skew even more to b18 stats)
    #than players who have played the most recent 2 years to reflect potential 
    #post injury change

    average_launch_speed = (((b18$launch_speed * 0.6) + (b17$launch_speed * 0.4)) 
                            / 2)

    average_launch_angle = (((b18$launch_angle * 0.6) + (b17$launch_angle * 0.4)) 
                            / 2)

  }

  if(playerid %in% b18$player_id == TRUE & playerid %in% b17$player_id == FALSE & playerid %in% b16$player_id == TRUE) {

    #in b18 and b16 but not b17


    average_launch_speed = (((b18$launch_speed * 0.6) + (b16$launch_speed * 0.4)) 
                            / 2)

    average_launch_angle = (((b18$launch_angle * 0.6) + (b16$launch_angle * 0.4)) 
                            / 2)
    }

  if(playerid %in% b18$player_id == FALSE & playerid %in% b17$player_id == TRUE
     & playerid %in% b16$player_id == TRUE) {

    #in b17 and b16 but not b18


    average_launch_speed = (((b17$launch_speed * 0.6) + (b16$launch_speed * 0.4)) 
                            / 2)

    average_launch_angle = (((b17$launch_angle * 0.6) + (b16$launch_angle * 0.4)) 
                            / 2)

  }

  # next are those in only one single frame/year
  # this one is only in 18

  if(playerid %in% b18$player_id == TRUE & playerid %in% b17$player_id == FALSE
     & playerid %in% b16$player_id == FALSE){

    average_launch_speed = b18$launch_speed

    average_launch_angle = b18$launch_angle 

  }

  # only in b17

  if(playerid %in% b18$player_id == FALSE & playerid %in% b17$player_id == TRUE
     & playerid %in% b16$player_id == FALSE){

    average_launch_speed = b17$launch_speed

    average_launch_angle = b17$launch_angle 

  }

  #only in b16

  if(playerid %in% b18$player_id == FALSE & playerid %in% b17$player_id == FALSE
     & playerid %in% b16$player_id == TRUE){

    average_launch_speed = b16$launch_speed

    average_launch_angle = b16$launch_angle 

  }

  combined_stats = list(playerid, average_launch_speed, average_launch_angle)

  # returning a data frame from the function
  write.csv(combined_stats, "combined_stats_1.csv", col.names = TRUE, row.names = FALSE)

 }

Let's start by combining all your datasets into one tidy one: 首先,将所有数据集合并为一个整齐的数据集:

batters_16$year<-2016
batters_17$year<-2017
batters_18$year<-2018
batters<-rbind(batters_16,batters_17,batters_18)

Now it's easy to do what you wanted using `dplyr': 现在,使用`dplyr'可以轻松完成您想要的事情:

batters<- batters %>% group_by(player_id)
filter(batters,any(year==2016) & all(year!=2017 & year!=2018)) # only 2016
filter(batters,any(year==2016) & any(year==2017) & all(year!=2018)) # only 2016 and 2017
etc...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 R 中,如果两个数据帧具有匹配的公共 ID,我如何 append 来自一列的分类变量? - In R, how do I append a categorical variable from one column if two data frames have a matching common ID? 我如何将所有因子列转换为具有与字符串列表匹配的别名的数字? - How do I convert all factor columns to numeric that have colnames matching from a list of strings? 如何通过匹配两个具有 +-5 范围的数字列来合并两个数据框? - How to merge two data frames by matching two numeric columns with a +-5 range? 在 R 中,如何将两个数据框中的某些列的值相乘,其中列具有匹配的名称 - In R how can I multiply values of certain columns from two data frames where the columns have matching names 根据多列的匹配行合并两个数据帧 - merge two data frames based on matching rows of multiple columns 如何引用列表中数据框中的行/列? - How do i refer to rows/ columns in data frames that are within a list? 如何提取数据帧列表的子集并将其行绑定到一个 - How to extract subset of list of data frames and row bind them into one 如何转换多个数据框中的列格式? - How can I convert the format of columns from multiple data frames? 如何从数据帧列表中 append 列? - How do I append columns from a list of data frames? 如何将多列从二进制数据转换为一列? - How do I convert multiple columns into one from Binary data?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM