简体   繁体   English

用多列收集 r 中的数据

[英]Gather data in r with multiple columns

I have some data which I am trying to use tidy R and pivot longer function in R to get the out put as mentioned below. I have some data which I am trying to use tidy R and pivot longer function in R to get the out put as mentioned below. But I am not able to do it, I am getting Data但我做不到,我正在获取数据

I have data in this format.我有这种格式的数据。 ( with many other column names ) (与许多其他列名)

Country State   Year 1  Population 1    Year 2  Population2
U.S.A   IL  2009    20000   2010    30000
U.S.A   VA  2009    30000   2010    40000

I want to get data in this format.我想以这种格式获取数据。

Country State   Year    Population
U.S.A   IL  2009    20000
U.S.A   IL  2010    30000
U.S.A   VA  2009    30000
U.S.A   VA  2010    40000

I am able to do it only for on column, but not able to pass other column likes like population我只能在列上这样做,但不能通过其他列,比如人口

My code is below.我的代码如下。

file1<-file %>%
 pivot_longer(
   cols = contains("Year"),
   names_sep = "_",
  names_to = c(".value", "repeat"),

 )

I was able to make it work on Tidyverse.我能够让它在 Tidyverse 上运行。

library(tidyverse)
file<-read_excel("peps300.xlsx")

names(file)<-str_replace_all(names(file), c("Year " = "Year_" , "Num " = "Num_", "DRate " = "DRate_" , "PRate " = "PRate_",  "Denom " = "Denom_"))

file<-file %>%
 pivot_longer(
   cols = c(contains("Year"),contains("Num"),contains("DRate"),contains("PRate"),contains("Denom")),
   names_sep = "_",
  names_to = c(".value", "repeat")
 )

An option would be to specify the cols that starts_with "Population" or "Year"一个选项是cols starts_with “人口”或“年份”开始的列

library(dplyr)
df1 %>% 
    pivot_longer(cols = c(starts_with("Population"), starts_with("Year")), 
    names_to = c(".value", "group"), names_pattern = "(.*)_(.*)")
# A tibble: 4 x 5
#  Country State group Population  Year
#  <chr>   <chr> <chr>      <int> <int>
#1 U.S.A   IL    1          20000  2009
#2 U.S.A   IL    2          30000  2010
#3 U.S.A   VA    1          30000  2009
#4 U.S.A   VA    2          40000  2010

data数据

df1 <- structure(list(Country = c("U.S.A", "U.S.A"), State = c("IL", 
"VA"), Year_1 = c(2009L, 2009L), Population_1 = c(20000L, 30000L
), Year_2 = c(2010L, 2010L), Population_2 = c(30000L, 40000L)), 
   class = "data.frame", row.names = c(NA, 
-2L))
df %>%
    pivot_longer(
        -c(Country,State),
        names_to = c(".value","group"),
        names_pattern = "(.+)_(.+)"
    )
# A tibble: 4 x 5
  Country State group Year  Population
  <chr>   <chr> <chr> <chr> <chr>     
1 U.S.A   IL    1     2009  20000     
2 U.S.A   IL    2     2010  30000     
3 U.S.A   VA    1     2009  30000     
4 U.S.A   VA    2     2010  40000 

You can then drop the group if you don't need it.如果不需要,您可以删除该group

And, to do this, you will need to clean your column names first.而且,为此,您需要先清理列名。 Make sure they all follow the same pattern and words are connected with a single space or a single underscore.确保它们都遵循相同的模式,并且单词用一个空格或一个下划线连接。

df <- structure(list(Country = c("U.S.A", "U.S.A"), State = c("IL", 
"VA"), Year_1 = c("2009", "2009"), Population_1 = c("20000", 
"30000"), Year_2 = c("2010", "2010"), Population_2 = c("30000", 
"40000")), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -2L))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM