提取字符串的不同部分

Question

我有一系列足球比賽，我正試圖將它們分解成 R 中的各個部分。例如，

"Jun 0103:00 PMTottenham0 - 2Liverpool(0 - 1)" should return

"Jun 01", "3:00PM", "Tottenham", "0", "2", "Liverpool", "0", "1"

和

"May 0803:00 PMAjax2 - 3Tottenham(2 - 0)" should return

"May 08", "3:00PM", "Ajax", "2", "3", "Tottenham", "2", "0"

目標是將其放入帶有標題的數據幀中

c("Date", "Time", "Home team", "Home team score", 
    "Away team score", "Away team", "Home team HT score", "Away team HT score")

Answer 1

x = c("Jun 0103:00 PMTottenham0 - 2Liverpool(0 - 1)", "May 0803:00 PMAjax2 - 3Tottenham(2 - 0)")
read.csv(header = FALSE,
         text = gsub("(^.{6})(.{8})(\\D+)(\\d+)\\s-\\s(\\d+)(\\D+)\\((\\d+)\\s-\\s(\\d+).*",
                     "\\1,\\2,\\3,\\4,\\5,\\6,\\7,\\8",
                     x))
#      V1       V2        V3 V4 V5        V6 V7 V8
#1 Jun 01 03:00 PM Tottenham  0  2 Liverpool  0  1
#2 May 08 03:00 PM      Ajax  2  3 Tottenham  2  0

Answer 2

整潔的方式...

library(tidyverse)
library(stringr)

strings <- tibble(full = c("Jun 0103:00 PMTottenham0 - 2Liverpool(0 - 1)", 
                           "May 0803:00 PMAjax2 - 3Tottenham(2 - 0)"))

strings %>% mutate(date = str_extract(full, ".{6}"),
                   time = str_extract(full, "\\d{2}:\\d{2}\\s(AM|PM)"),
                   team_home = str_extract(full, "(AM|PM)[[:alpha:]]+"),
                   team_home = str_remove(team_home, "(AM|PM)"),
                   score_home = str_extract(full, "\\d+\\s-"),
                   score_away = str_extract(full, "-\\s\\d+"),
                   team_away = str_extract(full, "\\d[[:alpha:]]+"),
                   team_away = str_remove(team_away, "\\d"),
                   score_ht_home = str_extract(full, "\\(."),
                   score_ht_away = str_extract(full, ".\\)")) %>% 
  mutate_at(vars(starts_with("score")), str_extract, pattern = "\\d+") %>% 
  mutate_at(vars(starts_with("score")), as.numeric) %>% 
  select(-full)

# A tibble: 2 x 8
  date   time     team_home score_home score_away team_away score_ht_home score_ht_away
  <chr>  <chr>    <chr>          <dbl>      <dbl> <chr>             <dbl>         <dbl>
1 Jun 01 03:00 PM Tottenham          0          2 Liverpool             0             1
2 May 08 03:00 PM Ajax               2          3 Tottenham             2             0

提取字符串的不同部分

問題描述

2 個解決方案

解決方案1
2 2020-03-13 23:26:42

解決方案2
0 已采納 2020-03-13 23:53:25

提取字符串的不同部分

問題描述

2 個解決方案

解決方案1 2 2020-03-13 23:26:42

解決方案2 0 已采納 2020-03-13 23:53:25

解決方案1
2 2020-03-13 23:26:42

解決方案2
0 已采納 2020-03-13 23:53:25