[英]Gather data in r with multiple columns
I have some data which I am trying to use tidy R and pivot longer function in R to get the out put as mentioned below. I have some data which I am trying to use tidy R and pivot longer function in R to get the out put as mentioned below. But I am not able to do it, I am getting Data
但我做不到,我正在获取数据
I have data in this format.我有这种格式的数据。 ( with many other column names )
(与许多其他列名)
Country State Year 1 Population 1 Year 2 Population2
U.S.A IL 2009 20000 2010 30000
U.S.A VA 2009 30000 2010 40000
I want to get data in this format.我想以这种格式获取数据。
Country State Year Population
U.S.A IL 2009 20000
U.S.A IL 2010 30000
U.S.A VA 2009 30000
U.S.A VA 2010 40000
I am able to do it only for on column, but not able to pass other column likes like population我只能在列上这样做,但不能通过其他列,比如人口
My code is below.我的代码如下。
file1<-file %>%
pivot_longer(
cols = contains("Year"),
names_sep = "_",
names_to = c(".value", "repeat"),
)
I was able to make it work on Tidyverse.我能够让它在 Tidyverse 上运行。
library(tidyverse)
file<-read_excel("peps300.xlsx")
names(file)<-str_replace_all(names(file), c("Year " = "Year_" , "Num " = "Num_", "DRate " = "DRate_" , "PRate " = "PRate_", "Denom " = "Denom_"))
file<-file %>%
pivot_longer(
cols = c(contains("Year"),contains("Num"),contains("DRate"),contains("PRate"),contains("Denom")),
names_sep = "_",
names_to = c(".value", "repeat")
)
An option would be to specify the cols
that starts_with
"Population" or "Year"一个选项是
cols
starts_with
“人口”或“年份”开始的列
library(dplyr)
df1 %>%
pivot_longer(cols = c(starts_with("Population"), starts_with("Year")),
names_to = c(".value", "group"), names_pattern = "(.*)_(.*)")
# A tibble: 4 x 5
# Country State group Population Year
# <chr> <chr> <chr> <int> <int>
#1 U.S.A IL 1 20000 2009
#2 U.S.A IL 2 30000 2010
#3 U.S.A VA 1 30000 2009
#4 U.S.A VA 2 40000 2010
df1 <- structure(list(Country = c("U.S.A", "U.S.A"), State = c("IL",
"VA"), Year_1 = c(2009L, 2009L), Population_1 = c(20000L, 30000L
), Year_2 = c(2010L, 2010L), Population_2 = c(30000L, 40000L)),
class = "data.frame", row.names = c(NA,
-2L))
df %>%
pivot_longer(
-c(Country,State),
names_to = c(".value","group"),
names_pattern = "(.+)_(.+)"
)
# A tibble: 4 x 5
Country State group Year Population
<chr> <chr> <chr> <chr> <chr>
1 U.S.A IL 1 2009 20000
2 U.S.A IL 2 2010 30000
3 U.S.A VA 1 2009 30000
4 U.S.A VA 2 2010 40000
You can then drop the group
if you don't need it.如果不需要,您可以删除该
group
。
And, to do this, you will need to clean your column names first.而且,为此,您需要先清理列名。 Make sure they all follow the same pattern and words are connected with a single space or a single underscore.
确保它们都遵循相同的模式,并且单词用一个空格或一个下划线连接。
df <- structure(list(Country = c("U.S.A", "U.S.A"), State = c("IL",
"VA"), Year_1 = c("2009", "2009"), Population_1 = c("20000",
"30000"), Year_2 = c("2010", "2010"), Population_2 = c("30000",
"40000")), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -2L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.