简体   繁体   中英

Split String to data frame

I'm trying to split a one-line text in R and store them in a dataframe.

For instance. A text like the following:

hello-world;1|(good)night world;2|...

Is expected to become:

V1    V2
hello-world    1
(good)night world    2

In order to achieve this: I start by splitting the initial text on '\\'. For that reason, I use separate from tidyr .

library(tidyr)
as.data.frame(str) %>% separate(str, into=c("V1"), sep='\\|')
1 hello-world;1
#Warning message:
#Too many values at 1 locations: 1

I suspect in the first split the issue rises with the - . How can I solve this issue?

How about this?

library(tidyverse)

text <- c("hello-world;1|(good)night world;2")

df_text <- data.frame(a = unlist(strsplit(text, "|", fixed = T)))

df_split_text <- separate(df_text, a, c("V1", "V2"), sep = ";")

If you want to do this via tidyverse then you need to use unnest to make it long and then separate the values ie

libraary(tidyverse)

data.frame(v1 = 'hello-world;1|(good)night world;2|') %>% 
       mutate(v1 = strsplit(as.character(v1), '\\|')) %>% 
       unnest(v1) %>% 
       separate(v1, into = c('v1', 'v2'), sep = ';')

# A tibble: 2 x 2
#                 v1    v2
#*             <chr> <chr>
#1       hello-world     1
#2 (good)night world     2

We know @udden2903 has given the best answer with tidyverse , but this base R should work too. Replace the | with \\n , and then read using read.table

read.table(text=gsub("[|]", "\n", text), header = FALSE, sep=";", stringsAsFactors= FALSE)
#                 V1 V2
#1       hello-world  1
#2 (good)night world  2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM