如何基於R中的其他列值將值放入列中

Question

我正在與R合作，為我作為記者的工作抓取並清理數據。 我可以獲取HTML的表，然后將其讀取為數據框並重命名列的名稱。 現在，我試圖創建一個新列，該列考慮其他列的值來獲取一個值。

此新列應獲取“ Avante”，“ DEM”，“ MDB”，“ Patriota”，“ PCdoB”等值。 這是每個副主席的聚會。 例如，阿凡特（Avante）有三名代表，分別是“阿達伯托·卡瓦爾坎蒂”，“卡波·薩比諾”和“西爾維奧·科斯塔”。 代表的姓名總是與黨的名字一起放在整行的下方。

url <- "http://www.camara.leg.br/internet/votacao/mostraVotacao.asp?ideVotacao=8559&numLegislatura=55&codCasa=1&numSessaoLegislativa=4&indTipoSessaoLegislativa=O&numSessao=225&indTipoSessao=E&tipo=partido"

library(xml2)
library(rvest)
file <- read_html(url)
tables <- html_nodes(file, "table")
table1 <- html_table(tables[3], fill = TRUE, header = T)

head(table1)

table1_df <- as.data.frame(table1)

colnames(table1_df) <- c("deputado", "uf", "voto")

這就是我現在所擁有的：在此處輸入圖片描述

這就是我想要的：在此處輸入圖片描述

Answer 1

這是僅使用基數R的解決方案：

url <- "http://www.camara.leg.br/internet/votacao/mostraVotacao.asp?ideVotacao=8559&numLegislatura=55&codCasa=1&numSessaoLegislativa=4&indTipoSessaoLegislativa=O&numSessao=225&indTipoSessao=E&tipo=partido"

library(xml2)
library(rvest)
file <- read_html(url)
tables <- html_nodes(file, "table")
table1 <- html_table(tables[3], fill = TRUE, header = T)

head(table1)

table1_df <- as.data.frame(table1)

colnames(table1_df) <- c("deputado", "uf", "voto")

# create the new column for later
table1_df$new_column <- NA

# identify rows with the Total PARTY: NUM rows
idx <- grep("Total.*: \\d+", table1_df$deputado)

# Loop over these and assign the values
for (i in seq_along(idx)){
  # Extract the number of deputados
  n <- as.numeric(sub("^.*: ", "", table1_df$deputado[idx[i]]))
  # Extract the party
  partido <- sub("Total ", "", table1_df$deputado[idx[i]])
  partido <- sub(": .*", "", partido)
  # Assign the values
  table1_df$new_column[(idx[i] - n):(idx[i] - 1)] <- partido
}

# Remove the unnecessary lines
table1_df <- table1_df[-grep("Total .*:.*", table1_df$deputado), ]
table1_df <- table1_df[-which(table1_df$deputado == table1_df$uf), ]

Answer 2

這是使用zoo和dplyr的另一個選擇。

1）獲取各方的名稱。

parties <- sub(pattern = "Total\\s(.+):\\s\\d+", 
                replacement = "\\1", 
                x = table1_df$deputado[grepl("Total", x = table1_df$deputado)])

2）將parties添加為新列，並攜帶最后的parties[match(table1_df$deputado, parties)]有許多NA 。

table1_df$new_col <- zoo::na.locf(parties[match(table1_df$deputado, parties)])

3）刪除不需要的行。

library(dplyr)
table1_df <- table1_df %>% 
  group_by(new_col) %>% 
  slice(2:(n()-1))
table1_df
# A tibble: 324 x 4
# Groups:   new_col [24]
#   deputado             uf    voto      new_col
#   <chr>                <chr> <chr>     <chr>  
# 1 Adalberto Cavalcanti PE    Não       Avante 
# 2 Cabo Sabino          CE    Abstenção Avante 
# 3 Silvio Costa         PE    Sim       Avante 
# 4 Alan Rick            AC    Sim       DEM    
# 5 Alberto Fraga        DF    Não       DEM    
# 6 Alexandre Leite      SP    Sim       DEM    
# 7 Arthur Oliveira Maia BA    Sim       DEM    
# 8 Carlos Melles        MG    Sim       DEM    
# 9 Efraim Filho         PB    Não       DEM    
#10 Eli Corrêa Filho     SP    Sim       DEM    
# ... with 314 more rows

如何基於R中的其他列值將值放入列中

問題描述

2 個解決方案

解決方案1
0 已采納 2018-11-29 20:41:26

解決方案2
0 2018-11-29 20:43:36

如何基於R中的其他列值將值放入列中

問題描述

2 個解決方案

解決方案1 0 已采納 2018-11-29 20:41:26

解決方案2 0 2018-11-29 20:43:36

解決方案1
0 已采納 2018-11-29 20:41:26

解決方案2
0 2018-11-29 20:43:36