简体   繁体   中英

Split long string into multiple dataframe columns while not splitting across a word

I am able to split a long string into 40 char columns using the following:

temp_df <- data.frame(
  long_string_column = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Whatever ornare nunc tellus, nec convallis enim viverra sit amet."
)


library(tidyr)
temp_df_new <- separate(temp_df, 
         long_string_column, 
         into = c("split1", "split2", "split3", "split4", "split5"), 
         sep = c(40, 80, 120, 160),
         remove = FALSE) 

However this splits across words and can result in half the word being in one column and the other half being in the next.

在此处输入图像描述

Is there anyway to ensure that splitting across words doesn't occur?

You can use str_wrap() and separate on the newline characters. This will avoid breaking up words and should result in the new columns having <= 40 characters each (although there may be exceptions depending on the nature of the original strings).

library(stringr)
library(dplyr)
library(tidyr)

temp_df <- temp_df %>%
  mutate(tmp = str_wrap(long_string_column, 40))

cols <- seq(max(str_count(temp_df$tmp, "\n") + 1))

temp_df %>%
  separate(tmp, 
           into = paste0("split_", cols), 
           sep = "\n",
           remove = FALSE) %>%
  select(-tmp)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM