从 R 中的一个字符列创建两个数字列

Question

The confidence interval column is of type character置信区间列是字符类型

confidence_interval置信区间
(245.0 - 345.2) (245.0 - 345.2)
(434.1 - 432.1) (434.1 - 432.1)
(123.5 - 1,120.2) (123.5 - 1,120.2)

I want to create two numeric columns like Upper Interval which has first value in the parentheses and lower interval which contains the second value我想创建两个数字列，例如 Upper Interval，它在括号中有第一个值，而 lower interval 包含第二个值

Upper Interval上区间	Lower Interval较低的间隔
245.0 245.0	345.2 345.2
434.1 434.1	432.1 432.1
123.5 123.5	1120.2 1120.2

How can this be done using R?如何使用 R 完成此操作？

Thanks谢谢

Answer 1

extract() from tidyr fits your case. tidyr的extract()适合您的情况。

library(tidyr)

df %>%
  extract(confidence_interval, into = c("Upper", "Lower"),
          regex = "\\((.+),(.+)\\)", convert = TRUE)

# # A tibble: 3 × 2
#   Upper Lower
#   <dbl> <dbl>
# 1  245   345.
# 2  434.  432.
# 3  124.  901.

Answer 2

This is one approach using sapply with strsplit and gsub这是将sapply与strsplit和gsub结合使用的一种方法

setNames(data.frame(t(sapply(strsplit(df$confidence_interval, " - "), function(x)
  gsub("\\(|\\)", "", x)))), c("Upper Interval", "Lower Interval"))
  Upper Interval Lower Interval
1          245.0          345.2
2          434.1          432.1
3          123.5        1,901.2

Data数据

df)
structure(list(confidence_interval = c("(245.0 - 345.2)", "(434.1 - 432.1)",
"(123.5 - 1,901.2)")), class = "data.frame", row.names = c(NA,
-3L))

Answer 3

Here is a solution.这是一个解决方案。

ci <- c('(245.0,345.2)', '(434.1,432.1)', '(123.5,901.2)')

values <- strsplit(gsub('\\(|\\)', '', ci), split = ",")

upper <- sapply(values, function(x) as.numeric(x[[1]]))
lower <- sapply(values, function(x) as.numeric(x[[2]]))

upper
#> [1] 245.0 434.1 123.5
lower
#> [1] 345.2 432.1 901.2

I use gsub to remove the parentheses, and then strsplit to split the values of each side of the , .我使用gsub删除括号，然后strsplit拆分 , 每一侧,值。 Then i use sapply to return this a vector as the return value of strsplit is a list of lists.然后我使用sapply返回这个向量，因为strsplit的返回值是列表列表。

OP question was edited OP问题已编辑

If separator between value is is ' - ' then you should use values <- strsplit(gsub('\\(|\\)', '', ci), split = " - ")如果值之间的分隔符是 ' - ' 那么你应该使用values <- strsplit(gsub('\\(|\\)', '', ci), split = " - ")

The split parameter in strsplit is what the function will use to split the strings into two parts. strsplit 中的split参数是strsplit将用来将字符串分成两部分的参数。

Answer 4

df %>%
  mutate(across(confidence_interval, ~ str_remove_all(.x, "[^0-9,\\.]"))) %>%
  separate(col = confidence_interval,
           into = c("higher", "lower"),
           sep = ",", convert = TRUE)

# A tibble: 3 × 2
  higher lower
   <dbl> <dbl>
1   245   345.
2   434.  432.
3   124.  901.

Answer 5

library(tidyverse)

ci <- c('(245.0,345.2)', '(434.1,432.1)', '(123.5,901.2)')      
data.frame(ci) |> 
  mutate(ci2 = stringr::str_replace_all(ci, "\\(|\\)", "")) |> 
  separate(ci2, c('upper', 'lower'), sep =",", convert = TRUE)
#>              ci upper lower
#> 1 (245.0,345.2) 245.0 345.2
#> 2 (434.1,432.1) 434.1 432.1
#> 3 (123.5,901.2) 123.5 901.2

Answer 6

Using strcapture :使用strcapture ：

ci <- c('(245.0,345.2)', '(434.1,432.1)', '(123.5,901.2)')

pattern <- "\\(([-.0-9]+),([-.0-9]+)\\)"
strcapture(pattern, ci, data.frame(upper.interval=numeric(), lower.interval=numeric()))

  upper.interval lower.interval
1          245.0          345.2
2          434.1          432.1
3          123.5          901.2

从 R 中的一个字符列创建两个数字列

问题描述

6 个解决方案

解决方案1
4 2023-01-17 11:51:51

解决方案2
2 2023-01-17 11:37:20

Data数据

解决方案3
1 已采纳 2023-01-17 11:35:38

解决方案4
1 2023-01-17 11:44:26

解决方案5
1 2023-01-17 11:44:34

解决方案6
1 2023-01-17 11:46:02

从 R 中的一个字符列创建两个数字列

问题描述

6 个解决方案

解决方案1 4 2023-01-17 11:51:51

解决方案2 2 2023-01-17 11:37:20

Data数据

解决方案3 1 已采纳 2023-01-17 11:35:38

解决方案4 1 2023-01-17 11:44:26

解决方案5 1 2023-01-17 11:44:34

解决方案6 1 2023-01-17 11:46:02

解决方案1
4 2023-01-17 11:51:51

解决方案2
2 2023-01-17 11:37:20

解决方案3
1 已采纳 2023-01-17 11:35:38

解决方案4
1 2023-01-17 11:44:26

解决方案5
1 2023-01-17 11:44:34

解决方案6
1 2023-01-17 11:46:02