[英]Complete column names with another dataframe column in R
我有這張桌子:
library(rvest)
library(tidyverse)
tables_team_pl <- read_html('https://www.win-or-lose.com/football-team-colours/')
color_table <- tables_team_pl %>% html_table() %>% pluck(1) %>% select(-Away)
還有這個:
table_1 <- structure(list(Team = c("Arsenal", "Aston Villa", "Blackburn",
"Bolton", "Chelsea", "Everton", "Fulham", "Liverpool", "Manchester City",
"Manchester Utd", "Newcastle Utd", "Norwich City", "QPR", "Stoke City",
"Sunderland", "Swansea City", "Tottenham", "West Brom", "Wigan Athletic",
"Wolves")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-20L))
如您所見,第二個表的名稱不完整。 例如, Manchester Utd
應該是Manchester United
如第一張表所示。
所以,我只需要完成第二個表,從第一個表中提取相同的名稱。
所以,我將更正table_1:曼聯應該改為曼聯,布萊克本應該改為布萊克本流浪者,依此類推。 完整的名稱應該來自第一個表。
同樣在第二張桌子上,我有 QPR,應該是“Queens Park Rangers”。
有什么幫助嗎?
我們可以使用strindist
join
library(fuzzyjoin)
library(dplyr)
stringdist_left_join(table_1, color_table, by = "Team", method = "soundex") %>%
transmute(Team = coalesce(Team.y, Team.x)) %>%
distinct
這是使用agrep
的基本 R解決方案。 它具有允許設置最大數量的插入、刪除和替換以實現匹配的簡潔功能。
table_1_original <- table_1
table_1$Team <- data.frame( Team=sapply( as.matrix(table_1), function(x){
a=agrep( x, tables_team_pl,
max=list(insert=0,del=0,subs=3));
if(!identical(a, integer(0))){ tables_team_pl[a] }
else{ x } } ) )
結果包括與原始的比較:
cbind(table_1_original, table_1)
Team Team
1 Arsenal Arsenal
2 Aston Villa Aston Villa
3 Blackburn Blackburn Rovers
4 Bolton Bolton
5 Chelsea Chelsea
6 Everton Everton
7 Fulham Fulham
8 Liverpool Liverpool
9 Manchester City Manchester City
10 Manchester Utd Manchester United
11 Newcastle Utd Newcastle United
12 Norwich City Norwich City
13 Queens Queens Park Rangers
14 Stoke City Stoke City
15 Sunderland Sunderland
16 Swansea City Swansea City
17 Tottenham Tottenham Hotspur
18 West Brom West Bromwich Albion
19 Wigan Athletic Wigan Athletic
20 Wolves Wolverhampton Wanderers
過濾后沒有顏色的 HTML 數據:
tables_team_pl <- c("Aberdeen", "AFC Bournemouth", "AFC Wimbledon", "Arsenal",
"Aston Villa", "Birmingham City", "Blackburn Rovers", "Bradford City",
"Brentford", "Brighton & Hove Albion", "Bristol City", "Burnley",
"Cardiff City", "Celtic", "Chelsea", "Crystal Palace", "Derby County",
"Dundee", "Dundee United", "Everton", "Fulham", "Hamilton Academical",
"Heart of Midlothian", "Hibernian", "Huddersfield Town", "Hull City",
"Inverness Caledonian Thistle", "Kilmarnock", "Leeds United",
"Leicester City", "Liverpool", "Livingston", "Manchester City",
"Manchester United", "Middlesbrough", "Millwall", "Motherwell",
"Newcastle United", "Norwich City", "Nottingham Forest", "Partick Thistle",
"Portsmouth", "Preston North End", "Queens Park Rangers", "Rangers",
"Reading", "Ross County", "Rotherham", "Sheffield United", "Sheffield Wednesday",
"Southampton", "St Johnstone", "St Mirren", "Stoke City", "Sunderland",
"Swansea", "Tottenham Hotspur", "Watford", "West Bromwich Albion",
"West Ham United", "Wolverhampton Wanderers", "Wycombe Wanderers")
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.