繁体   English   中英

使用dplyr :: lag整理数据框并填充变量

[英]Using dplyr::lag to tidy data frame and fill variables

我正在尝试清理数据,以便将包含“ gamecentre-playbyplay-event”的一行正下方的每一行标记为目标,而将包含“目标”行正下方的“ gamecentre-playbyplay-event”的每一行都标记为目标标记为主要辅助,并且在“主要辅助”行正下方包含“ gamecentre-playbyplay-event”的每一行都标记为辅助辅助。

数据如下所示:

mydata

# A tibble: 15 x 1
   value                                                                                 
   <chr>                                                                                 
 1 "<div class=\"gamecentre-playbyplay-event team-border--lhjmq-bat gamecentre-playby"   
 2 "<a href=\"/players/14695\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
 3 "<a href=\"/players/16639\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
 4 "<a href=\"/players/17027\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
 5 "<div class=\"gamecentre-playbyplay-event team-border--lhjmq-mon gamecentre-playby"   
 6 "<a href=\"/players/17453\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
 7 "<a href=\"/players/14639\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
 8 "<div class=\"gamecentre-playbyplay-event team-border--lhjmq-mon gamecentre-playby"   
 9 "<a href=\"/players/18061\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
10 "<a href=\"/players/14752\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
11 "<a href=\"/players/17522\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
12 "<div class=\"gamecentre-playbyplay-event team-border--lhjmq-mon gamecentre-playby"   
13 "<a href=\"/players/14752\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
14 "<a href=\"/players/14639\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
15 "<a href=\"/players/14757\" class=\"gamecentre__link gamecentre__link--goal\" data-re"

但是这里有一些问题。

  1. 我需要设置条件,以便正确标记行。
  2. 如果没有“辅助”行,则将该行标记为NA
  3. 如果没有“主要辅助”行,则该行也被标记为NA

我正在尝试使用dplyr::lag() ,但是我想在没有主要或次要辅助的情况下使用NA令人困惑。

这是我到目前为止所拥有的基础:

goals <- mydata %>%
  filter(dplyr::lag(str_detect(value, "gamecentre-playbyplay-event team-border"), 1))

goals

# A tibble: 4 x 1
  value                                                                                                                                
  <chr>                                                                                                                                
1 "<a href=\"/players/14695\" class=\"gamecentre__link gamecentre__link--goal\" data-re
2 "<a href=\"/players/17453\" class=\"gamecentre__link gamecentre__link--goal\" data-re
3 "<a href=\"/players/18061\" class=\"gamecentre__link gamecentre__link--goal\" data-re
4 "<a href=\"/players/14752\" class=\"gamecentre__link gamecentre__link--goal\" data-re

这就是我希望我的数据在所有这些结束时看起来像的样子。 我认为使用dplyr::lag()dplyr::lag()的方法,但我不确定。

# A tibble: 4 x 3
  goal                                     primary_assist                                secondary_assist                              
  <chr>                                    <chr>                                         <chr>                                         
1 "<a href=\"/players/14695\" class=\"gam~ "<a href=\"/players/16639\" class=\"gamecent~ "<a href=\"/players/17027\" class=\"gamecentr~
2 "<a href=\"/players/17453\" class=\"gam~ "<a href=\"/players/14639\" class=\"gamecent~ NA                                            
3 "<a href=\"/players/18061\" class=\"gam~ "<a href=\"/players/14752\" class=\"gamecent~ "<a href=\"/players/17522\" class=\"gamecentr~
4 "<a href=\"/players/14752\" class=\"gam~ "<a href=\"/players/14639\" class=\"gamecent~ "<a href=\"/players/14757\" class=\"gamecentr~

有任何想法吗?

dput:

    mydata <- structure(list(value = c("<div class=\"gamecentre-playbyplay-event team-border--lhjmq-bat gamecentre-playby", 
"<a href=\"/players/14695\" class=\"gamecentre__link gamecentre__link--goal\" data-re", 
"<a href=\"/players/16639\" class=\"gamecentre__link gamecentre__link--goal\" data-re", 
"<a href=\"/players/17027\" class=\"gamecentre__link gamecentre__link--goal\" data-re", 
"<div class=\"gamecentre-playbyplay-event team-border--lhjmq-mon gamecentre-playby", 
"<a href=\"/players/17453\" class=\"gamecentre__link gamecentre__link--goal\" data-re", 
"<a href=\"/players/14639\" class=\"gamecentre__link gamecentre__link--goal\" data-re", 
"<div class=\"gamecentre-playbyplay-event team-border--lhjmq-mon gamecentre-playby", 
"<a href=\"/players/18061\" class=\"gamecentre__link gamecentre__link--goal\" data-re", 
"<a href=\"/players/14752\" class=\"gamecentre__link gamecentre__link--goal\" data-re", 
"<a href=\"/players/17522\" class=\"gamecentre__link gamecentre__link--goal\" data-re", 
"<div class=\"gamecentre-playbyplay-event team-border--lhjmq-mon gamecentre-playby", 
"<a href=\"/players/14752\" class=\"gamecentre__link gamecentre__link--goal\" data-re", 
"<a href=\"/players/14639\" class=\"gamecentre__link gamecentre__link--goal\" data-re", 
"<a href=\"/players/14757\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
)), .Names = "value", class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -15L))

一种选择是创建一个分组变量,然后spread

library(tidyverse)
mydata %>%
   #create a group based on the occurrence of 'playby'
   group_by(grp = cumsum(str_detect(value, 'playby'))) %>% 
   # filter out the first row of the group that have playby
   filter(row_number() > 1) %>% 
   # create a new category column
   mutate(categ = c("goal", "primary_assist", "secondary_assist")[row_number()]) %>%
   # spread from long to wide
   spread(categ, value) %>% 
   # remove the grouping column as part of clean up
   ungroup %>% 
   select(-grp)
# A tibble: 4 x 3
#  goal                                   primary_assist                              secondary_assist                           
#  <chr>                                  <chr>                                       <chr>                                      
#1 "<a href=\"/players/14695\" class=\"g… "<a href=\"/players/16639\" class=\"gamece… "<a href=\"/players/17027\" class=\"gamece…
#2 "<a href=\"/players/17453\" class=\"g… "<a href=\"/players/14639\" class=\"gamece… <NA>                                       
#3 "<a href=\"/players/18061\" class=\"g… "<a href=\"/players/14752\" class=\"gamece… "<a href=\"/players/17522\" class=\"gamece…
#4 "<a href=\"/players/14752\" class=\"g… "<a href=\"/players/14639\" class=\"gamece… "<a href=\"/players/14757\" class=\"gamece…

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM