[英]Using dplyr::lag to tidy data frame and fill variables
我正在尝试清理数据,以便将包含“ gamecentre-playbyplay-event”的一行正下方的每一行标记为目标,而将包含“目标”行正下方的“ gamecentre-playbyplay-event”的每一行都标记为目标标记为主要辅助,并且在“主要辅助”行正下方包含“ gamecentre-playbyplay-event”的每一行都标记为辅助辅助。
数据如下所示:
mydata
# A tibble: 15 x 1
value
<chr>
1 "<div class=\"gamecentre-playbyplay-event team-border--lhjmq-bat gamecentre-playby"
2 "<a href=\"/players/14695\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
3 "<a href=\"/players/16639\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
4 "<a href=\"/players/17027\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
5 "<div class=\"gamecentre-playbyplay-event team-border--lhjmq-mon gamecentre-playby"
6 "<a href=\"/players/17453\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
7 "<a href=\"/players/14639\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
8 "<div class=\"gamecentre-playbyplay-event team-border--lhjmq-mon gamecentre-playby"
9 "<a href=\"/players/18061\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
10 "<a href=\"/players/14752\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
11 "<a href=\"/players/17522\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
12 "<div class=\"gamecentre-playbyplay-event team-border--lhjmq-mon gamecentre-playby"
13 "<a href=\"/players/14752\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
14 "<a href=\"/players/14639\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
15 "<a href=\"/players/14757\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
但是这里有一些问题。
NA
。 NA
。 我正在尝试使用dplyr::lag()
,但是我想在没有主要或次要辅助的情况下使用NA
令人困惑。
这是我到目前为止所拥有的基础:
goals <- mydata %>%
filter(dplyr::lag(str_detect(value, "gamecentre-playbyplay-event team-border"), 1))
goals
# A tibble: 4 x 1
value
<chr>
1 "<a href=\"/players/14695\" class=\"gamecentre__link gamecentre__link--goal\" data-re
2 "<a href=\"/players/17453\" class=\"gamecentre__link gamecentre__link--goal\" data-re
3 "<a href=\"/players/18061\" class=\"gamecentre__link gamecentre__link--goal\" data-re
4 "<a href=\"/players/14752\" class=\"gamecentre__link gamecentre__link--goal\" data-re
这就是我希望我的数据在所有这些结束时看起来像的样子。 我认为使用dplyr::lag()
是dplyr::lag()
的方法,但我不确定。
# A tibble: 4 x 3
goal primary_assist secondary_assist
<chr> <chr> <chr>
1 "<a href=\"/players/14695\" class=\"gam~ "<a href=\"/players/16639\" class=\"gamecent~ "<a href=\"/players/17027\" class=\"gamecentr~
2 "<a href=\"/players/17453\" class=\"gam~ "<a href=\"/players/14639\" class=\"gamecent~ NA
3 "<a href=\"/players/18061\" class=\"gam~ "<a href=\"/players/14752\" class=\"gamecent~ "<a href=\"/players/17522\" class=\"gamecentr~
4 "<a href=\"/players/14752\" class=\"gam~ "<a href=\"/players/14639\" class=\"gamecent~ "<a href=\"/players/14757\" class=\"gamecentr~
有任何想法吗?
dput:
mydata <- structure(list(value = c("<div class=\"gamecentre-playbyplay-event team-border--lhjmq-bat gamecentre-playby",
"<a href=\"/players/14695\" class=\"gamecentre__link gamecentre__link--goal\" data-re",
"<a href=\"/players/16639\" class=\"gamecentre__link gamecentre__link--goal\" data-re",
"<a href=\"/players/17027\" class=\"gamecentre__link gamecentre__link--goal\" data-re",
"<div class=\"gamecentre-playbyplay-event team-border--lhjmq-mon gamecentre-playby",
"<a href=\"/players/17453\" class=\"gamecentre__link gamecentre__link--goal\" data-re",
"<a href=\"/players/14639\" class=\"gamecentre__link gamecentre__link--goal\" data-re",
"<div class=\"gamecentre-playbyplay-event team-border--lhjmq-mon gamecentre-playby",
"<a href=\"/players/18061\" class=\"gamecentre__link gamecentre__link--goal\" data-re",
"<a href=\"/players/14752\" class=\"gamecentre__link gamecentre__link--goal\" data-re",
"<a href=\"/players/17522\" class=\"gamecentre__link gamecentre__link--goal\" data-re",
"<div class=\"gamecentre-playbyplay-event team-border--lhjmq-mon gamecentre-playby",
"<a href=\"/players/14752\" class=\"gamecentre__link gamecentre__link--goal\" data-re",
"<a href=\"/players/14639\" class=\"gamecentre__link gamecentre__link--goal\" data-re",
"<a href=\"/players/14757\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
)), .Names = "value", class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -15L))
一种选择是创建一个分组变量,然后spread
library(tidyverse)
mydata %>%
#create a group based on the occurrence of 'playby'
group_by(grp = cumsum(str_detect(value, 'playby'))) %>%
# filter out the first row of the group that have playby
filter(row_number() > 1) %>%
# create a new category column
mutate(categ = c("goal", "primary_assist", "secondary_assist")[row_number()]) %>%
# spread from long to wide
spread(categ, value) %>%
# remove the grouping column as part of clean up
ungroup %>%
select(-grp)
# A tibble: 4 x 3
# goal primary_assist secondary_assist
# <chr> <chr> <chr>
#1 "<a href=\"/players/14695\" class=\"g… "<a href=\"/players/16639\" class=\"gamece… "<a href=\"/players/17027\" class=\"gamece…
#2 "<a href=\"/players/17453\" class=\"g… "<a href=\"/players/14639\" class=\"gamece… <NA>
#3 "<a href=\"/players/18061\" class=\"g… "<a href=\"/players/14752\" class=\"gamece… "<a href=\"/players/17522\" class=\"gamece…
#4 "<a href=\"/players/14752\" class=\"g… "<a href=\"/players/14639\" class=\"gamece… "<a href=\"/players/14757\" class=\"gamece…
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.