I have a data frame with a single column that I'd like to split on R. It contains date, text and numbers. I want to keep my text in a single column, so I cannot separate by space. I had the idea to add a dash between words and separate by spaces afterwards. But I don't know how to do that without removing the first and last letter of words.
Does anyone has an idea either:
This the type of data frame I have:
tab <- data.frame(c1 = c("21.03.2016 This amasingly interesting text 2'000.50 3'000.60",
"22.03.2016 This other terrific text 5'000.54 6'000.90"))
#This is what I would like to obtain
tab1 <- data.frame(c1 = c("21.03.2016", "22.03.2016"),
c2 = c("This amasingly interesting text", "This other terrific text"),
c3 = c( "2'000.50", "5'000.54"),
c4 = c( "3'000.60", "6'000.90"))
#This is what I did to add dash
tab <- gsub("[A-z] [A-z]","_", tab$c1)
tab <- data.frame(tab)
library(stringr)
tab <- data.frame(str_split_fixed(tab$tab, " ", 4))
#This is pretty much what I want unless that some letters are missing
tab$X2 <- gsub("_"," ",tab$X2)
You can try tidyr::extract
function and provide regex
argument to separate text from a column in your expected ways.
One such attempt can be as:
library(tidyverse)
tab %>% extract(col = c1, into = c("C1","C2","C3","C4"),
regex = "([0-9.]+)\\s([A-Za-z ]+)\\s([0-9.']+)\\s(.*)")
# C1 C2 C3 C4
# 1 21.03.2016 This amasingly interesting text 2'000.50 3'000.60
# 2 22.03.2016 This other terrific text 5'000.54 6'000.90
Regex explanation:
`([0-9.]+)` - Look for `0-9` or `.` and make 1st group for 1st column `\\\\s` - Leave a space `([A-Za-z ]+)` - Look for `alphabetic` or `space` characters. Group for 2nd column `\\\\s` - Leave a space ([0-9.'] - Look for `0-9`, `.` or `'` and make group for 3rd column `\\\\s` - Leave a space (.*) - Anything at the end to make group for 4th column
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.