从 R 中的数据点提取一些数字的最有效方法是什么？（加上其他具体步骤！）

Question

I've got quite a specific problem, for which I can just about find a very hacky solution, but I'm hoping somebody could outline a slightly more elegant method.我有一个非常具体的问题，对此我几乎可以找到一个非常hacky的解决方案，但我希望有人可以概述一个稍微更优雅的方法。

I have a CSV file, consisting of one row per historical football match played.我有一个 CSV 文件，每场历史足球比赛由一行组成。 The fields I care about look something like this:我关心的字段看起来像这样：

home_team <- c("Team A", "Team B", "Team B")
away_team <- c("Team C", "Team C", "Team D")
home_goals <- c(2, 0, 1)
away_goals <- c(1, 2, 0)
home_goal_mins <- c("5 60", "NA", "80")
away_goal_mins <- c("15", "20 40", "NA")

df <- data.frame(home_team, away_team, home_goals, away_goals, home_goal_mins, away_goal_mins,
                 stringsAsFactors = FALSE)

df
#>   home_team away_team home_goals away_goals home_goal_mins away_goal_mins
#> 1    Team A    Team C          2          1           5 60             15
#> 2    Team B    Team C          0          2             NA          20 40
#> 3    Team B    Team D          1          0             80             NA

^{Created on 2020-10-05 by the reprex package (v0.3.0)}^{由reprex 包(v0.3.0) 于 2020 年 10 月 5 日创建}

My goal is to transform this dataframe such that there is one line per goal scored, per game, like this:我的目标是转换这个数据框，使得每场比赛每个进球都有一条线，如下所示：

The main challenges, as I see them:在我看来，主要挑战是：

The *_goal_mins fields are read in as strings containing both numbers and NAs *_goal_mins字段作为包含数字和 NA 的字符串读入
Replicating the rows such that the Home/Away team combinations have the same number of rows as the total number of goals for that match复制行，使主/客队组合的行数与该比赛的总进球数相同

With regards to (1), I've been using stringr::str_split(., " ") to extract the numbers but then struggle to transform them into a numeric vector.关于（1），我一直在使用stringr::str_split(., " ")来提取数字，但随后很难将它们转换为数字向量。 Taking the first row of df as an example, I'm struggling to transform "5 60" into c(5, 60) , and it gets harder for me when I try to combine the home team's "5 60" with the away team's "15" to get the full goal sequence of c(5, 15, 60) .以第一排df为例，我正在努力将"5 60"转换为c(5, 60) ，当我尝试将主队的"5 60"与客队的"15"得到c(5, 15, 60)的完整目标序列。

As for (2), my current approach is to calculate the total_goals_scored per match, and do the following:至于（2），我目前的做法是计算每场比赛的总total_goals_scored数，并执行以下操作：

expanded_df <- df[rep(seq_len(dim(df)[1]),
                      df$total_goals_scored), ]

but I sense that there may be a better method.但我觉得可能有更好的方法。

Any help or tips will be appreciated!任何帮助或提示将不胜感激！ Thanks谢谢

Answer 1

Using dplyr and tidyr library you could do使用dplyr和tidyr库，你可以做

bring home_goal_mins and away_goal_mins in same column using pivot_longer .使用pivot_longer将home_goal_mins和away_goal_mins放在同一列中。
Split the data on whitespace and separate the goals in separate rows在空白处拆分数据并将目标分开在单独的行中
Drop NA values删除NA值
arrange data based on timestamp根据时间戳arrange数据
Get data in wide format.以宽格式获取数据。

library(dplyr)
library(tidyr)

df %>%
  pivot_longer(cols = c(home_goal_mins, away_goal_mins)) %>%
  separate_rows(value, sep = ' ', convert = TRUE) %>%
  filter(!is.na(value)) %>%
  arrange(home_team, away_team, value) %>%
  group_by(home_team, away_team) %>%
  mutate(row = row_number()) %>%
  pivot_wider()

#  home_team away_team home_goals away_goals   row home_goal_mins away_goal_mins
#  <chr>     <chr>          <dbl>      <dbl> <int>          <int>          <int>
#1 Team A    Team C             2          1     1              5             NA
#2 Team A    Team C             2          1     2             NA             15
#3 Team A    Team C             2          1     3             60             NA
#4 Team B    Team C             0          2     1             NA             20
#5 Team B    Team C             0          2     2             NA             40
#6 Team B    Team D             1          0     1             80             NA

从 R 中的数据点提取一些数字的最有效方法是什么？（加上其他具体步骤！）

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-10-05 09:58:20

从 R 中的数据点提取一些数字的最有效方法是什么？ （加上其他具体步骤！）

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-10-05 09:58:20

从 R 中的数据点提取一些数字的最有效方法是什么？（加上其他具体步骤！）

解决方案1
1 已采纳 2020-10-05 09:58:20