[英]Extracting different parts of a string
我有一系列足球比賽,我正試圖將它們分解成 R 中的各個部分。例如,
"Jun 0103:00 PMTottenham0 - 2Liverpool(0 - 1)" should return
"Jun 01", "3:00PM", "Tottenham", "0", "2", "Liverpool", "0", "1"
和
"May 0803:00 PMAjax2 - 3Tottenham(2 - 0)" should return
"May 08", "3:00PM", "Ajax", "2", "3", "Tottenham", "2", "0"
目標是將其放入帶有標題的數據幀中
c("Date", "Time", "Home team", "Home team score",
"Away team score", "Away team", "Home team HT score", "Away team HT score")
x = c("Jun 0103:00 PMTottenham0 - 2Liverpool(0 - 1)", "May 0803:00 PMAjax2 - 3Tottenham(2 - 0)")
read.csv(header = FALSE,
text = gsub("(^.{6})(.{8})(\\D+)(\\d+)\\s-\\s(\\d+)(\\D+)\\((\\d+)\\s-\\s(\\d+).*",
"\\1,\\2,\\3,\\4,\\5,\\6,\\7,\\8",
x))
# V1 V2 V3 V4 V5 V6 V7 V8
#1 Jun 01 03:00 PM Tottenham 0 2 Liverpool 0 1
#2 May 08 03:00 PM Ajax 2 3 Tottenham 2 0
整潔的方式...
library(tidyverse)
library(stringr)
strings <- tibble(full = c("Jun 0103:00 PMTottenham0 - 2Liverpool(0 - 1)",
"May 0803:00 PMAjax2 - 3Tottenham(2 - 0)"))
strings %>% mutate(date = str_extract(full, ".{6}"),
time = str_extract(full, "\\d{2}:\\d{2}\\s(AM|PM)"),
team_home = str_extract(full, "(AM|PM)[[:alpha:]]+"),
team_home = str_remove(team_home, "(AM|PM)"),
score_home = str_extract(full, "\\d+\\s-"),
score_away = str_extract(full, "-\\s\\d+"),
team_away = str_extract(full, "\\d[[:alpha:]]+"),
team_away = str_remove(team_away, "\\d"),
score_ht_home = str_extract(full, "\\(."),
score_ht_away = str_extract(full, ".\\)")) %>%
mutate_at(vars(starts_with("score")), str_extract, pattern = "\\d+") %>%
mutate_at(vars(starts_with("score")), as.numeric) %>%
select(-full)
# A tibble: 2 x 8
date time team_home score_home score_away team_away score_ht_home score_ht_away
<chr> <chr> <chr> <dbl> <dbl> <chr> <dbl> <dbl>
1 Jun 01 03:00 PM Tottenham 0 2 Liverpool 0 1
2 May 08 03:00 PM Ajax 2 3 Tottenham 2 0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.