简体   繁体   English

从 R 中的一个字符列创建两个数字列

[英]Create two numeric columns from One character column in R

The confidence interval column is of type character置信区间列是字符类型

confidence_interval置信区间
(245.0 - 345.2) (245.0 - 345.2)
(434.1 - 432.1) (434.1 - 432.1)
(123.5 - 1,120.2) (123.5 - 1,120.2)

I want to create two numeric columns like Upper Interval which has first value in the parentheses and lower interval which contains the second value我想创建两个数字列,例如 Upper Interval,它在括号中有第一个值,而 lower interval 包含第二个值

Upper Interval上区间 Lower Interval较低的间隔
245.0 245.0 345.2 345.2
434.1 434.1 432.1 432.1
123.5 123.5 1120.2 1120.2

How can this be done using R?如何使用 R 完成此操作?

Thanks谢谢

extract() from tidyr fits your case. tidyrextract()适合您的情况。

library(tidyr)

df %>%
  extract(confidence_interval, into = c("Upper", "Lower"),
          regex = "\\((.+),(.+)\\)", convert = TRUE)

# # A tibble: 3 × 2
#   Upper Lower
#   <dbl> <dbl>
# 1  245   345.
# 2  434.  432.
# 3  124.  901.

This is one approach using sapply with strsplit and gsub这是将sapplystrsplitgsub结合使用的一种方法

setNames(data.frame(t(sapply(strsplit(df$confidence_interval, " - "), function(x)
  gsub("\\(|\\)", "", x)))), c("Upper Interval", "Lower Interval"))
  Upper Interval Lower Interval
1          245.0          345.2
2          434.1          432.1
3          123.5        1,901.2

Data数据

df)
structure(list(confidence_interval = c("(245.0 - 345.2)", "(434.1 - 432.1)",
"(123.5 - 1,901.2)")), class = "data.frame", row.names = c(NA,
-3L))

Here is a solution.这是一个解决方案。

ci <- c('(245.0,345.2)', '(434.1,432.1)', '(123.5,901.2)')

values <- strsplit(gsub('\\(|\\)', '', ci), split = ",")

upper <- sapply(values, function(x) as.numeric(x[[1]]))
lower <- sapply(values, function(x) as.numeric(x[[2]]))

upper
#> [1] 245.0 434.1 123.5
lower
#> [1] 345.2 432.1 901.2

I use gsub to remove the parentheses, and then strsplit to split the values of each side of the , .我使用gsub删除括号,然后strsplit拆分 , 每一侧,值。 Then i use sapply to return this a vector as the return value of strsplit is a list of lists.然后我使用sapply返回这个向量,因为strsplit的返回值是列表列表。

OP question was edited OP问题已编辑

If separator between value is is ' - ' then you should use values <- strsplit(gsub('\\(|\\)', '', ci), split = " - ")如果值之间的分隔符是 ' - ' 那么你应该使用values <- strsplit(gsub('\\(|\\)', '', ci), split = " - ")

The split parameter in strsplit is what the function will use to split the strings into two parts. strsplit 中的split参数是strsplit将用来将字符串分成两部分的参数。

df %>%
  mutate(across(confidence_interval, ~ str_remove_all(.x, "[^0-9,\\.]"))) %>%
  separate(col = confidence_interval,
           into = c("higher", "lower"),
           sep = ",", convert = TRUE)

# A tibble: 3 × 2
  higher lower
   <dbl> <dbl>
1   245   345.
2   434.  432.
3   124.  901.
library(tidyverse)

ci <- c('(245.0,345.2)', '(434.1,432.1)', '(123.5,901.2)')      
data.frame(ci) |> 
  mutate(ci2 = stringr::str_replace_all(ci, "\\(|\\)", "")) |> 
  separate(ci2, c('upper', 'lower'), sep =",", convert = TRUE)
#>              ci upper lower
#> 1 (245.0,345.2) 245.0 345.2
#> 2 (434.1,432.1) 434.1 432.1
#> 3 (123.5,901.2) 123.5 901.2

Using strcapture :使用strcapture

ci <- c('(245.0,345.2)', '(434.1,432.1)', '(123.5,901.2)')

pattern <- "\\(([-.0-9]+),([-.0-9]+)\\)"
strcapture(pattern, ci, data.frame(upper.interval=numeric(), lower.interval=numeric()))

  upper.interval lower.interval
1          245.0          345.2
2          434.1          432.1
3          123.5          901.2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM