[英]How do I create two groups when converts dataframe from wide to long in r?
I have a wide dataset which I would like to convert to a long format and split the columns into two groups;我有一个宽数据集,我想将其转换为长格式并将列分成两组; in the summary example below, variable score n.1 would be group 1 and variable score n.2.
在下面的摘要示例中,变量得分 n.1 将是组 1,变量得分 n.2。 How do I convert a wide format data frame into a long format data frame and assign a group to the respective variables?
如何将宽格式数据框转换为长格式数据框并将组分配给相应的变量?
age.1<- c(23,34,52,12,23)
score1.1 <- c(44,23,62,1,0)
score2.1<- c(3,4,2,1,3)
score3.1<- c(230,304,502,102,203)
score1.2<- c(2343,4534,5652,1642,2233)
score1.2<- c(2233,32324,5232,1232,2233)
score2.2<- c(12323,12334,1352,1312,1323)
score3.2<- c(21233,33454,53452,12452,23532523)
df<- data.frame(age.1, score1.1,score2.1, score3.1, score1.2, score2.2, score3.2)
We can use stringi::stri_extract_last_regex
to extract the last numbers from a string:我们可以使用
stringi::stri_extract_last_regex
从字符串中提取最后一个数字:
library(dplyr)
library(tidyr)
df %>%
pivot_longer(-age.1) %>%
mutate(group = stringi::stri_extract_last_regex(name, "[0-9]+"))
# # A tibble: 30 × 4
# age.1 name value group
# <dbl> <chr> <dbl> <chr>
# 1 23 score1.1 44 1
# 2 23 score2.1 3 1
# 3 23 score3.1 230 1
# 4 23 score1.2 2233 2
# 5 23 score2.2 12323 2
# 6 23 score3.2 21233 2
# 7 34 score1.1 23 1
# 8 34 score2.1 4 1
# 9 34 score3.1 304 1
# 10 34 score1.2 32324 2
# # … with 20 more rows
If you would like to split the number after the period into a new column while pivoting, then we can use names_pattern
:如果您想在旋转时将句点之后的数字拆分为一个新列,那么我们可以使用
names_pattern
:
library(tidyverse)
df %>% pivot_longer(
cols = -age.1,
names_to = c("name", "group"),
names_pattern = "(.+).(.)",
values_to = "Value"
)
Output Output
# A tibble: 30 × 4
age.1 name group Value
<dbl> <chr> <chr> <dbl>
1 23 score1 1 44
2 23 score2 1 3
3 23 score3 1 230
4 23 score1 2 2233
5 23 score2 2 12323
6 23 score3 2 21233
7 34 score1 1 23
8 34 score2 1 4
9 34 score3 1 304
10 34 score1 2 32324
# … with 20 more rows
However, if you need to retain those numbers in the name
column, then we can just get the value after pivoting:但是,如果您需要在
name
列中保留这些数字,那么我们可以在旋转后获取值:
df %>%
pivot_longer(-age.1) %>%
mutate(group = str_replace(name, '.*\\.', ""))
Output Output
# A tibble: 30 × 4
age.1 name value group
<dbl> <chr> <dbl> <chr>
1 23 score1.1 44 1
2 23 score2.1 3 1
3 23 score3.1 230 1
4 23 score1.2 2233 2
5 23 score2.2 12323 2
6 23 score3.2 21233 2
7 34 score1.1 23 1
8 34 score2.1 4 1
9 34 score3.1 304 1
10 34 score1.2 32324 2
# … with 20 more rows
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.