I'm trying to split a one-line text in R and store them in a dataframe.
For instance. A text like the following:
hello-world;1|(good)night world;2|...
Is expected to become:
V1 V2
hello-world 1
(good)night world 2
In order to achieve this: I start by splitting the initial text on '\\'. For that reason, I use separate from tidyr
.
library(tidyr)
as.data.frame(str) %>% separate(str, into=c("V1"), sep='\\|')
1 hello-world;1
#Warning message:
#Too many values at 1 locations: 1
I suspect in the first split the issue rises with the -
. How can I solve this issue?
How about this?
library(tidyverse)
text <- c("hello-world;1|(good)night world;2")
df_text <- data.frame(a = unlist(strsplit(text, "|", fixed = T)))
df_split_text <- separate(df_text, a, c("V1", "V2"), sep = ";")
If you want to do this via tidyverse
then you need to use unnest
to make it long and then separate
the values ie
libraary(tidyverse)
data.frame(v1 = 'hello-world;1|(good)night world;2|') %>%
mutate(v1 = strsplit(as.character(v1), '\\|')) %>%
unnest(v1) %>%
separate(v1, into = c('v1', 'v2'), sep = ';')
# A tibble: 2 x 2
# v1 v2
#* <chr> <chr>
#1 hello-world 1
#2 (good)night world 2
We know @udden2903 has given the best answer with tidyverse
, but this base R
should work too. Replace the |
with \\n
, and then read using read.table
read.table(text=gsub("[|]", "\n", text), header = FALSE, sep=";", stringsAsFactors= FALSE)
# V1 V2
#1 hello-world 1
#2 (good)night world 2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.