用多列收集 r 中的数据

Question

I have some data which I am trying to use tidy R and pivot longer function in R to get the out put as mentioned below. I have some data which I am trying to use tidy R and pivot longer function in R to get the out put as mentioned below. But I am not able to do it, I am getting Data但我做不到，我正在获取数据

I have data in this format.我有这种格式的数据。 ( with many other column names ) （与许多其他列名）

Country State   Year 1  Population 1    Year 2  Population2
U.S.A   IL  2009    20000   2010    30000
U.S.A   VA  2009    30000   2010    40000

I want to get data in this format.我想以这种格式获取数据。

Country State   Year    Population
U.S.A   IL  2009    20000
U.S.A   IL  2010    30000
U.S.A   VA  2009    30000
U.S.A   VA  2010    40000

I am able to do it only for on column, but not able to pass other column likes like population我只能在列上这样做，但不能通过其他列，比如人口

My code is below.我的代码如下。

file1<-file %>%
 pivot_longer(
   cols = contains("Year"),
   names_sep = "_",
  names_to = c(".value", "repeat"),

 )

I was able to make it work on Tidyverse.我能够让它在 Tidyverse 上运行。

library(tidyverse)
file<-read_excel("peps300.xlsx")

names(file)<-str_replace_all(names(file), c("Year " = "Year_" , "Num " = "Num_", "DRate " = "DRate_" , "PRate " = "PRate_",  "Denom " = "Denom_"))

file<-file %>%
 pivot_longer(
   cols = c(contains("Year"),contains("Num"),contains("DRate"),contains("PRate"),contains("Denom")),
   names_sep = "_",
  names_to = c(".value", "repeat")
 )

Answer 1

An option would be to specify the cols that starts_with "Population" or "Year"一个选项是cols starts_with “人口”或“年份”开始的列

library(dplyr)
df1 %>% 
    pivot_longer(cols = c(starts_with("Population"), starts_with("Year")), 
    names_to = c(".value", "group"), names_pattern = "(.*)_(.*)")
# A tibble: 4 x 5
#  Country State group Population  Year
#  <chr>   <chr> <chr>      <int> <int>
#1 U.S.A   IL    1          20000  2009
#2 U.S.A   IL    2          30000  2010
#3 U.S.A   VA    1          30000  2009
#4 U.S.A   VA    2          40000  2010

data数据

df1 <- structure(list(Country = c("U.S.A", "U.S.A"), State = c("IL", 
"VA"), Year_1 = c(2009L, 2009L), Population_1 = c(20000L, 30000L
), Year_2 = c(2010L, 2010L), Population_2 = c(30000L, 40000L)), 
   class = "data.frame", row.names = c(NA, 
-2L))

Answer 2

df %>%
    pivot_longer(
        -c(Country,State),
        names_to = c(".value","group"),
        names_pattern = "(.+)_(.+)"
    )

# A tibble: 4 x 5
  Country State group Year  Population
  <chr>   <chr> <chr> <chr> <chr>     
1 U.S.A   IL    1     2009  20000     
2 U.S.A   IL    2     2010  30000     
3 U.S.A   VA    1     2009  30000     
4 U.S.A   VA    2     2010  40000

You can then drop the group if you don't need it.如果不需要，您可以删除该group 。

And, to do this, you will need to clean your column names first.而且，为此，您需要先清理列名。 Make sure they all follow the same pattern and words are connected with a single space or a single underscore.确保它们都遵循相同的模式，并且单词用一个空格或一个下划线连接。

df <- structure(list(Country = c("U.S.A", "U.S.A"), State = c("IL", 
"VA"), Year_1 = c("2009", "2009"), Population_1 = c("20000", 
"30000"), Year_2 = c("2010", "2010"), Population_2 = c("30000", 
"40000")), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -2L))

用多列收集 r 中的数据

问题描述

2 个解决方案

解决方案1
2 已采纳 2019-10-04 15:36:45

data数据

解决方案2
1 2019-10-04 15:47:05

用多列收集 r 中的数据

问题描述

2 个解决方案

解决方案1 2 已采纳 2019-10-04 15:36:45

data数据

解决方案2 1 2019-10-04 15:47:05

解决方案1
2 已采纳 2019-10-04 15:36:45

解决方案2
1 2019-10-04 15:47:05