[英]Using R to split a string into multiple column instead of a vector in one column
我想根據分配的分隔符號將數據集中的一列拆分為多列:“|”。
我的數據集如下所示:
vname<-c("x1", "x2", "x3","x4")
label<-c("1,Eng |2,Man", "1,yes|2,no|3,dont know", "1,never|2,sometimes|3,usually|4,always", "1,yes|2,No|3,dont know")
df<-data.frame(vname, label)
所以,我想根據符號“|”將 column: label 拆分為多個列。我使用 stringr::str_split 來做到這一點,我的代碼如下:
cd2<-df %>%
select(vname, everything())%>%
mutate(label=str_split(value, " \\| "))
但是,結果在標簽列中返回一個向量。 它看起來像這樣:
vname label
x1 c("1,Eng","2,Man")
x2 c("1,yes","2,no", "3,dont know")
....
我的問題是如何獲得這樣的預期結果:
vname label1 label2 label3 label4
x1 1,Eng 2,Man
x2 1,yes 2,no 3, dont know
x3 1,never. 2,sometimes, 3,usually. 4,always
...
非常感謝幫助~~~
dput(head(cd2, 10))
structure(list(variable = c("x2", "x8", "x9", "x10", "x13", "x14",
"x15", "x20", "x22", NA), vname = c("consenting_language", "county",
"respondent", "residence", "language", "int_q1", "int_q2", "int_q4",
"int_q5", "int_q6"), label = c("Consenting Language", "County",
"Respondent Type", "Residence", "Interview language ", "1. What was your sex at birth?",
"2. How would you describe your current sexual orientation?",
"4. What is the highest level of education you completed?", "5. What is your current marital status?",
"<div class=\"rich-text-field-label\"><p>6. Is <span style=\"color: #3598db;\">regular </span>your partner currently living with you now, or does s/he stay elsewhere?</p></div>"
), value = c("1, English | 2, Kiswahili", "1, County011 | 2, County014 | 3, County002| 4, County006 | 5, County010 | 6, County008 | 7, County005 | 8, County003 | 9, County012| 10, County004 | 11, County009 | 12, County001 | 13, County015 | 14, County007 | 15, County012",
"1, FSW | 2, MSM | 3, AGYW", "1, Urban | 2, Peri urban | 3, Rural",
"1, English | 2, Kiswahili", "1, Male | 2, Female", "1, Homosexual/Gay | 2, Bisexual | 3, Heterosexual/Straight | 4, Transgender Male | 5, Transgender Female | 96, Other | 98, Don't Know | 99, Decline to state",
"1,None | 2,Nursery/kindergarten | 3,Primary | 4,Secondary | 5,Tertiary/Vocational | 6,College/University | 7,Adult education | 96,Other",
"1, Single/Not married | 2, Married | 3, Cohabiting | 4, Divorced | 5, Separated | 6, Widowed | 7, In a relationship",
"1, Living with You | 2, Staying Elsewhere")), row.names = c(NA,
10L), class = "data.frame")
使用所使用的代碼,它返回一個list
(也許我們必須確保有零個或多個空格,因為在示例中沒有空格),我們可以unnest_wider
轉換為新列
library(dplyr)
library(stringr)
library(tidyr)
df %>%
select(vname, everything())%>%
mutate(label=str_split(label, "\\s*\\|\\s*")) %>%
unnest_wider(where(is.list), names_sep = "")
-輸出
# A tibble: 4 × 5
vname label1 label2 label3 label4
<chr> <chr> <chr> <chr> <chr>
1 x1 1,Eng 2,Man <NA> <NA>
2 x2 1,yes 2,no 3,dont know <NA>
3 x3 1,never 2,sometimes 3,usually 4,always
4 x4 1,yes 2,No 3,dont know <NA>
這也可以通過separate
的
library(tidyr)
df %>%
separate(label, into = str_c('label',
seq_len(max(str_count(.$label, fixed("|"))) + 1)),
sep = "\\|", fill = "right")
-輸出
vname label1 label2 label3 label4
1 x1 1,Eng 2,Man <NA> <NA>
2 x2 1,yes 2,no 3,dont know <NA>
3 x3 1,never 2,sometimes 3,usually 4,always
4 x4 1,yes 2,No 3,dont know <NA>
或使用 OP 的數據 'cd2' - 在|
之前和之后添加空格的大小寫。
cd2new <- cd2 %>%
separate(value, into = str_c('value',
seq_len(max(str_count(.$value, fixed("|"))) + 1)),
sep = "\\s*\\|\\s*", fill = "right")
-輸出
> head(cd2new, 2)
variable vname label value1 value2 value3 value4 value5
1 x2 consenting_language Consenting Language 1, English 2, Kiswahili <NA> <NA> <NA>
2 x8 county County 1, County011 2, County014 3, County002 4, County006 5, County010
value6 value7 value8 value9 value10 value11 value12 value13
1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 6, County008 7, County005 8, County003 9, County012 10, County004 11, County009 12, County001 13, County015
value14 value15
1 <NA> <NA>
2 14, County007 15, County012
你可以簡單地通過使用 {tidyr} 中的separate()
來做到這一點
library(tidyverse)
dat %>% as_tibble() %>%
separate(value, sep = "\\s*\\|\\s*",
into = paste0("value", seq(str_count(.$value, "\\s*\\|\\s*"))))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.