簡體   English   中英

使用 R 中的另一個數據框列完成列名

[英]Complete column names with another dataframe column in R

我有這張桌子:

library(rvest)
library(tidyverse)
tables_team_pl <- read_html('https://www.win-or-lose.com/football-team-colours/')
color_table <- tables_team_pl %>% html_table() %>% pluck(1) %>% select(-Away)

還有這個:

table_1 <- structure(list(Team = c("Arsenal", "Aston Villa", "Blackburn", 
"Bolton", "Chelsea", "Everton", "Fulham", "Liverpool", "Manchester City", 
"Manchester Utd", "Newcastle Utd", "Norwich City", "QPR", "Stoke City", 
"Sunderland", "Swansea City", "Tottenham", "West Brom", "Wigan Athletic", 
"Wolves")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-20L))

如您所見,第二個表的名稱不完整。 例如, Manchester Utd應該是Manchester United如第一張表所示。

所以,我只需要完成第二個表,從第一個表中提取相同的名稱。

所以,我將更正table_1:曼聯應該改為曼聯,布萊克本應該改為布萊克本流浪者,依此類推。 完整的名稱應該來自第一個表

同樣在第二張桌子上,我有 QPR,應該是“Queens Park Rangers”。

有什么幫助嗎?

我們可以使用strindist join

library(fuzzyjoin)
library(dplyr)
stringdist_left_join(table_1, color_table, by = "Team", method = "soundex") %>%
     transmute(Team = coalesce(Team.y, Team.x)) %>%
     distinct

這是使用agrep基本 R解決方案。 它具有允許設置最大數量的插入刪除替換以實現匹配的簡潔功能。

table_1_original <- table_1

table_1$Team <- data.frame( Team=sapply( as.matrix(table_1), function(x){
                       a=agrep( x, tables_team_pl,
                       max=list(insert=0,del=0,subs=3));
                       if(!identical(a, integer(0))){ tables_team_pl[a] }
                       else{ x } } ) )

結果包括與原始的比較:

cbind(table_1_original, table_1)
              Team                    Team
1          Arsenal                 Arsenal
2      Aston Villa             Aston Villa
3        Blackburn        Blackburn Rovers
4           Bolton                  Bolton
5          Chelsea                 Chelsea
6          Everton                 Everton
7           Fulham                  Fulham
8        Liverpool               Liverpool
9  Manchester City         Manchester City
10  Manchester Utd       Manchester United
11   Newcastle Utd        Newcastle United
12    Norwich City            Norwich City
13          Queens     Queens Park Rangers
14      Stoke City              Stoke City
15      Sunderland              Sunderland
16    Swansea City            Swansea City
17       Tottenham       Tottenham Hotspur
18       West Brom    West Bromwich Albion
19  Wigan Athletic          Wigan Athletic
20          Wolves Wolverhampton Wanderers

過濾后沒有顏色的 HTML 數據:

tables_team_pl <- c("Aberdeen", "AFC Bournemouth", "AFC Wimbledon", "Arsenal", 
"Aston Villa", "Birmingham City", "Blackburn Rovers", "Bradford City", 
"Brentford", "Brighton & Hove Albion", "Bristol City", "Burnley", 
"Cardiff City", "Celtic", "Chelsea", "Crystal Palace", "Derby County", 
"Dundee", "Dundee United", "Everton", "Fulham", "Hamilton Academical", 
"Heart of Midlothian", "Hibernian", "Huddersfield Town", "Hull City", 
"Inverness Caledonian Thistle", "Kilmarnock", "Leeds United", 
"Leicester City", "Liverpool", "Livingston", "Manchester City", 
"Manchester United", "Middlesbrough", "Millwall", "Motherwell", 
"Newcastle United", "Norwich City", "Nottingham Forest", "Partick Thistle", 
"Portsmouth", "Preston North End", "Queens Park Rangers", "Rangers", 
"Reading", "Ross County", "Rotherham", "Sheffield United", "Sheffield Wednesday", 
"Southampton", "St Johnstone", "St Mirren", "Stoke City", "Sunderland", 
"Swansea", "Tottenham Hotspur", "Watford", "West Bromwich Albion", 
"West Ham United", "Wolverhampton Wanderers", "Wycombe Wanderers")

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM