[英]Create two numeric columns from One character column in R
The confidence interval column is of type character置信区间列是字符类型
confidence_interval![]() |
---|
(245.0 - 345.2) ![]() |
(434.1 - 432.1) ![]() |
(123.5 - 1,120.2) ![]() |
I want to create two numeric columns like Upper Interval which has first value in the parentheses and lower interval which contains the second value我想创建两个数字列,例如 Upper Interval,它在括号中有第一个值,而 lower interval 包含第二个值
Upper Interval![]() |
Lower Interval![]() |
---|---|
245.0 ![]() |
345.2 ![]() |
434.1 ![]() |
432.1 ![]() |
123.5 ![]() |
1120.2 ![]() |
How can this be done using R?如何使用 R 完成此操作?
Thanks谢谢
extract()
from tidyr
fits your case. tidyr
的extract()
适合您的情况。
library(tidyr)
df %>%
extract(confidence_interval, into = c("Upper", "Lower"),
regex = "\\((.+),(.+)\\)", convert = TRUE)
# # A tibble: 3 × 2
# Upper Lower
# <dbl> <dbl>
# 1 245 345.
# 2 434. 432.
# 3 124. 901.
This is one approach using sapply
with strsplit
and gsub
这是将
sapply
与strsplit
和gsub
结合使用的一种方法
setNames(data.frame(t(sapply(strsplit(df$confidence_interval, " - "), function(x)
gsub("\\(|\\)", "", x)))), c("Upper Interval", "Lower Interval"))
Upper Interval Lower Interval
1 245.0 345.2
2 434.1 432.1
3 123.5 1,901.2
df)
structure(list(confidence_interval = c("(245.0 - 345.2)", "(434.1 - 432.1)",
"(123.5 - 1,901.2)")), class = "data.frame", row.names = c(NA,
-3L))
Here is a solution.这是一个解决方案。
ci <- c('(245.0,345.2)', '(434.1,432.1)', '(123.5,901.2)')
values <- strsplit(gsub('\\(|\\)', '', ci), split = ",")
upper <- sapply(values, function(x) as.numeric(x[[1]]))
lower <- sapply(values, function(x) as.numeric(x[[2]]))
upper
#> [1] 245.0 434.1 123.5
lower
#> [1] 345.2 432.1 901.2
I use gsub
to remove the parentheses, and then strsplit
to split the values of each side of the ,
.我使用
gsub
删除括号,然后strsplit
拆分 , 每一侧,
值。 Then i use sapply
to return this a vector as the return value of strsplit
is a list of lists.然后我使用
sapply
返回这个向量,因为strsplit
的返回值是列表列表。
OP question was edited OP问题已编辑
If separator between value is is ' - ' then you should use values <- strsplit(gsub('\\(|\\)', '', ci), split = " - ")
如果值之间的分隔符是 ' - ' 那么你应该使用
values <- strsplit(gsub('\\(|\\)', '', ci), split = " - ")
The split
parameter in strsplit
is what the function will use to split the strings into two parts. strsplit 中的
split
参数是strsplit
将用来将字符串分成两部分的参数。
df %>%
mutate(across(confidence_interval, ~ str_remove_all(.x, "[^0-9,\\.]"))) %>%
separate(col = confidence_interval,
into = c("higher", "lower"),
sep = ",", convert = TRUE)
# A tibble: 3 × 2
higher lower
<dbl> <dbl>
1 245 345.
2 434. 432.
3 124. 901.
library(tidyverse)
ci <- c('(245.0,345.2)', '(434.1,432.1)', '(123.5,901.2)')
data.frame(ci) |>
mutate(ci2 = stringr::str_replace_all(ci, "\\(|\\)", "")) |>
separate(ci2, c('upper', 'lower'), sep =",", convert = TRUE)
#> ci upper lower
#> 1 (245.0,345.2) 245.0 345.2
#> 2 (434.1,432.1) 434.1 432.1
#> 3 (123.5,901.2) 123.5 901.2
Using strcapture
:使用
strcapture
:
ci <- c('(245.0,345.2)', '(434.1,432.1)', '(123.5,901.2)')
pattern <- "\\(([-.0-9]+),([-.0-9]+)\\)"
strcapture(pattern, ci, data.frame(upper.interval=numeric(), lower.interval=numeric()))
upper.interval lower.interval
1 245.0 345.2
2 434.1 432.1
3 123.5 901.2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.