如何根據R中列名的第一部分收集列？

Question

如果我有一個類似於以下的數據集：

# State Ben.Carson.Number.of.Votes Ben.Carson.Party Ben.Carson.Percent Bernie.Sanders.Votes Bernie Sanders.Percent Bernie.Sanders.Party 
#  OH   305.                       Republican       8.3                500                  12.30                  Democrat
#  FL   20                         Republican       3.0                700                  11.00.                 Democrat
#  TX   400.                       Republican       5.0                 50                   1.00                  Democrat

如何從當前位於數據集中的所有單獨列中創建四個統一列，候選人姓名、投票、百分比和政黨？ 即，根據位於列名稱中的候選名稱將所有三種類型的列聚集在一起。

我嘗試了以下但無濟於事：

tidyElectionData %>%
  gather(key, value, -c(County, Location.State, State)) %>%
  separate(key, into = c("Candidate", "Party"), sep = "(^[^.]+[.][^.]+)(.+$)") %>%
  spread(Party, value)

Answer 1

基於 tidyverse 的解決方案如下所示。

library(dplyr)
library(tidyr)
library(stringr)

df %>%
  mutate(across(everything(), as.character)) %>%
  pivot_longer(-State) %>%
  mutate(names = str_extract(name, 'Votes|Party|Percent'),
         name = str_extract(name, 'Ben.Carson|Bernie.Sanders')) %>%
  pivot_wider(names_from = names, values_from = value)

#   State name           Votes Party      Percent
#   <chr> <chr>          <chr> <chr>      <chr>  
# 1 OH    Ben.Carson     305   Republican 8.3    
# 2 OH    Bernie.Sanders 500   Democrat   12.3   
# 3 FL    Ben.Carson     20    Republican 3      
# 4 FL    Bernie.Sanders 700   Democrat   11     
# 5 TX    Ben.Carson     400   Republican 5      
# 6 TX    Bernie.Sanders 50    Democrat   1

數據

df <- structure(list(State = c("OH", "FL", "TX"), Ben.Carson.Number.of.Votes = c(305, 
20, 400), Ben.Carson.Party = c("Republican", "Republican", "Republican"
), Ben.Carson.Percent = c(8.3, 3, 5), Bernie.Sanders.Votes = c(500, 
700, 50), Bernie.Sanders.Percent = c(12.3, 11, 1), Bernie.Sanders.Party = c("Democrat", 
"Democrat", "Democrat")), row.names = c(NA, -3L), class = c("tbl_df", 
"tbl", "data.frame"))

Answer 2

在基礎 R 中，您可以執行以下操作：

candidates <- unique(sub("(\\w+[.]\\w+).*","\\1",names(df)[-1]))

columns <- split(names(df[-1]),sub(".*[.]","",names(df)[-1]))

df1<-reshape(df, columns, dir = "long", times = candidates, idvar = "State")

names(df1)[-1]<-c("candidate", names(columns))
rownames(df1) <- NULL
df1
  State      candidate      Party Percent Votes
1    OH     Ben.Carson Republican     8.3   305
2    FL     Ben.Carson Republican       3    20
3    TX     Ben.Carson Republican       5   400
4    OH Bernie.Sanders   Democrat   12.30   500
5    FL Bernie.Sanders   Democrat  11.00.   700
6    TX Bernie.Sanders   Democrat    1.00    50

如何根據R中列名的第一部分收集列？

問題描述

2 個解決方案

解決方案1
0 已采納 2020-11-10 19:20:13

解決方案2
0 2020-11-10 19:36:03

如何根據R中列名的第一部分收集列？

問題描述

2 個解決方案

解決方案1 0 已采納 2020-11-10 19:20:13

解決方案2 0 2020-11-10 19:36:03

解決方案1
0 已采納 2020-11-10 19:20:13

解決方案2
0 2020-11-10 19:36:03