[英]Splitting data with one column into more columns
I a fresh R user and I can't find how to properly spit my data into 5 columns (name, surname, title, area_code and phone_number).我是一个新的 R 用户,但我找不到如何正确地将我的数据分成 5 列(姓名、姓氏、标题、区域代码和电话号码)。
df=read.table("school.txt")
df <- data.frame(
stringsAsFactors = FALSE,
V1= c("Lebel, MarieStudent 1st year216 132-3789",
"Lachance, PaulTeacher 2nd year567 990-345 ext 1811",
"Smith, AnnieStudent 1st yearNot available")
I was able to separate the data into 2 columns to get the names by doing this:通过执行以下操作,我能够将数据分成 2 列以获取名称:
df1= data.frame(str_split_fixed(df$V1, ",", 2))
Thank you in advance先感谢您
You can use regex to separate out the data into different columns.您可以使用正则表达式将数据分成不同的列。 Using
tidyr::extract
:使用
tidyr::extract
:
tidyr::extract(df, V1,
c("surname", "name", "title", "year","area_code", "phone_number"),
'(\\w+),\\s([A-Za-z]+)(Teacher|Student)\\s(\\w+\\syear)(\\d+)?\\s?(.*)?')
# surname name title year area_code phone_number
#1 Lebel Marie Student 1st year 216 132-3789
#2 Lachance Paul Teacher 2nd year 567 990-345 ext 1811
#3 Smith Annie Student 1st year Not available
Most likely possible to do it in fewer steps...最有可能在更少的步骤中做到这一点......
df %>%
separate(V1,into=c("name","step1"),sep=",") %>%
mutate(title = case_when(str_detect(step1,pattern = "Student") ~ "Student",
str_detect(step1,pattern = "Teacher") ~ "Teacher",
TRUE ~ NA_character_
)) %>%
mutate(step2 = str_replace(step1,title,replacement = "")) %>%
separate(step2,into=c("surname","step3"),"[0-9]+(st|nd|rd|th)+",remove = FALSE) %>%
mutate(step3 = str_replace(step2,surname,"")) %>%
mutate(year = str_extract(step3,"[0-9](st|nd|rd|th) year")) %>%
mutate(step4 = str_replace(step3,year,"")) %>%
mutate(area_code = str_extract(step4,"[0-9]+\\s")) %>%
mutate(phone_number = str_replace(step4,area_code,"")) %>%
dplyr::select(-step1,-step2,-step3,-step4)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.