拆分字符串在 R 中保留空格

Question

我想使用 readr::read_fwf 从原始文本准备一个表格。 有一个参数 col_position 负责确定在我的情况下可能不同的列宽度。 表始终包含 4 列，并且基于字符串中的 4 个第一个单词，例如： category variable description value sth

> text_for_column_width = "category    variable   description      value      sth"
> nchar("category    ")
[1] 12
> nchar("variable   ")
[1] 11
> nchar("description      ")
[1] 17
> nchar("value      ")
[1] 11

我想获得 4 个第一个单词，但保留空格以具有 8[ab]+4[spaces] 个字符的category ，最后创建一个向量，包括四个名称 c(12,11,17,11) 中每一个的字符数。 我尝试将 strsplit 与空间拆分参数一起使用，然后计算现有的零，但是我相信使用正确的正则表达式有更快的方法。

Answer 1

一个可能的解决方案，使用stringr ：

library(tidyverse)

text_for_column_width = "category    variable   description      value      sth"

strings <- text_for_column_width %>% 
  str_remove("sth$") %>% 
  str_split("(?<=\\s)(?=\\S)") %>% 
  unlist

strings

#> [1] "category    "      "variable   "       "description      "
#> [4] "value      "

strings %>% str_count

#> [1] 12 11 17 11

Answer 2

您可以使用utils::strcapture ：

text_for_column_width = "category    variable   description      value      sth"
pattern <- "^(\\S+\\s+)(\\S+\\s+)(\\S+\\s+)(\\S+\\s*)"
result <- utils::strcapture(pattern, text_for_column_width, list(f1 = character(), f2 = character(), f3 = character(), f4 = character()))
nchar(as.character(as.vector(result[1,])))
## => [1] 12 11 17 11

请参阅正则表达式演示。 ^(\S+\s+)(\S+\s+)(\S+\s+)(\S+\s*)匹配

^ - 字符串的开头
(\S+\s+) - 第 1 组：一个或多个非空白字符，然后是一个或多个空白
(\S+\s+) - 第 2 组：一个或多个非空白字符，然后是一个或多个空白
(\S+\s+) - 第 3 组：一个或多个非空白字符，然后是一个或多个空白
(\S+\s*) - 第 4 组：一个或多个非空白字符，然后是零个或多个空白

Answer 3

你也可以使用这个模式：

stringr::str_split("category    variable   description      value      sth", "\\s+") %>%
unlist() %>%
purrr::map_int(nchar)

拆分字符串在 R 中保留空格

问题描述

3 个解决方案

解决方案1
4 已采纳 2022-01-18 14:37:44

解决方案2
0 2022-01-18 19:30:43

解决方案3
0 2022-01-20 23:38:38

拆分字符串在 R 中保留空格

问题描述

3 个解决方案

解决方案1 4 已采纳 2022-01-18 14:37:44

解决方案2 0 2022-01-18 19:30:43

解决方案3 0 2022-01-20 23:38:38

解决方案1
4 已采纳 2022-01-18 14:37:44

解决方案2
0 2022-01-18 19:30:43

解决方案3
0 2022-01-20 23:38:38