简体   繁体   English

将一列的数据拆分为多列

[英]Splitting data with one column into more columns

I a fresh R user and I can't find how to properly spit my data into 5 columns (name, surname, title, area_code and phone_number).我是一个新的 R 用户,但我找不到如何正确地将我的数据分成 5 列(姓名、姓氏、标题、区域代码和电话号码)。

df=read.table("school.txt")

df <- data.frame(
    stringsAsFactors = FALSE,
    
V1= c("Lebel, MarieStudent 1st year216 132-3789",
           "Lachance, PaulTeacher 2nd year567 990-345 ext 1811",
           "Smith, AnnieStudent 1st yearNot available")

I was able to separate the data into 2 columns to get the names by doing this:通过执行以下操作,我能够将数据分成 2 列以获取名称:

df1= data.frame(str_split_fixed(df$V1, ",", 2)) 

Thank you in advance先感谢您

You can use regex to separate out the data into different columns.您可以使用正则表达式将数据分成不同的列。 Using tidyr::extract :使用tidyr::extract

tidyr::extract(df, V1,
       c("surname", "name", "title", "year","area_code",  "phone_number"), 
       '(\\w+),\\s([A-Za-z]+)(Teacher|Student)\\s(\\w+\\syear)(\\d+)?\\s?(.*)?')

#   surname  name   title     year area_code     phone_number
#1    Lebel Marie Student 1st year       216         132-3789
#2 Lachance  Paul Teacher 2nd year       567 990-345 ext 1811
#3    Smith Annie Student 1st year              Not available

Most likely possible to do it in fewer steps...最有可能在更少的步骤中做到这一点......

df %>% 
  separate(V1,into=c("name","step1"),sep=",") %>% 
  mutate(title = case_when(str_detect(step1,pattern = "Student") ~ "Student",
                           str_detect(step1,pattern = "Teacher") ~ "Teacher",
                           TRUE ~ NA_character_
                           )) %>% 
  mutate(step2 = str_replace(step1,title,replacement = "")) %>% 
  separate(step2,into=c("surname","step3"),"[0-9]+(st|nd|rd|th)+",remove = FALSE) %>% 
  mutate(step3 = str_replace(step2,surname,"")) %>% 
  mutate(year = str_extract(step3,"[0-9](st|nd|rd|th) year")) %>% 
  mutate(step4 = str_replace(step3,year,"")) %>% 
  mutate(area_code = str_extract(step4,"[0-9]+\\s")) %>% 
  mutate(phone_number = str_replace(step4,area_code,"")) %>% 
  dplyr::select(-step1,-step2,-step3,-step4)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM