简体   繁体   English

管道 stringr str_detect 到 str_extract 的问题 - 提取仅从第一行提取文本:参数不是原子向量; 胁迫

[英]Issue with piping stringr str_detect into str_extract - extract is only pulling text from 1st row: argument is not an atomic vector; coercing

I'm trying to create a new column which just contains certain numeric data from an expression.我正在尝试创建一个新列,其中仅包含表达式中的某些数字数据。

Here's my data: https://pastebin.com/hYg3zqYz这是我的数据: https://pastebin.com/hYg3zqYz

I just need the numbers that come after Bipolar in column 12.我只需要第 12 列中 Bipolar 之后的数字。

Here's what works这是有效的

p <- df %>% 
      select(where(~ any(stringr::str_detect(.x, "Bipolar")))) #returns correct column

Where I try then try to make a new column that pulls just the text, it only ever returns the first row, not sure what I'm doing wrong.在我尝试然后尝试创建一个仅提取文本的新列的地方,它只返回第一行,不确定我做错了什么。

p %>%
      mutate(group = "sr_bipol",
             sr_bipol = as.numeric(stringr::str_extract(., "[0-9].[0-9]+"))) %>% 
       select(group, sr_bipol)

# A tibble: 20 × 2
   group    sr_bipol
   <chr>       <dbl>
 1 sr_bipol     7.83
 2 sr_bipol     7.83
 3 sr_bipol     7.83
 4 sr_bipol     7.83
 5 sr_bipol     7.83
.....................

I also get the error code:我也得到错误代码:

 argument is not an atomic vector; coercing 

The . . refers to the whole dataset ( str_extract needs a vector as input and not a data.frame).指整个数据集( str_extract需要一个向量作为输入,而不是 data.frame)。 According to ?str_extract根据?str_extract

string - Input vector.字符串 - 输入向量。 Either a character vector, or something coercible to one.要么是字符向量,要么是可以强制转换的东西。

We may need to apply str_extract on the column 12. As the column name for 12 prefix include ... that are unusual column names, use backticks to access the column values我们可能需要在第 12 列上应用str_extract 。由于 12 前缀的列名包括...是不常见的列名,因此使用反引号来访问列值

library(dplyr)
library(stringr)
df %>% 
  transmute(group = 'sr_bipol', 
    sr_bipol = as.numeric(str_extract(`...12`, "(?<=Bipolar\\s)[0-9]\\.[0-9]+")))

-output -输出

# A tibble: 20 × 2
   group    sr_bipol
   <chr>       <dbl>
 1 sr_bipol     7.83
 2 sr_bipol     2.34
 3 sr_bipol     1.97
 4 sr_bipol     1.94
 5 sr_bipol     2.85
 6 sr_bipol     2.92
 7 sr_bipol     3.05
 8 sr_bipol     2.80
 9 sr_bipol     3.43
10 sr_bipol     2.11
11 sr_bipol     2.80
12 sr_bipol     1.81
13 sr_bipol     1.84
14 sr_bipol     3.87
15 sr_bipol     1.68
16 sr_bipol     2.21
17 sr_bipol     2.97
18 sr_bipol     3.09
19 sr_bipol     2.84
20 sr_bipol     3.48

The p data is a single column tibble/data.frame . p数据是单列tibble/data.frame When we use .当我们使用. , it selects the data.frame as such ie ,它选择data.frame,即

> str(p)
tibble [20 × 1] (S3: tbl_df/tbl/data.frame)
 $ ...12: chr [1:20] "Bipolar 7.827 / Unipolar 16.911 / LAT -9.0" "Bipolar 2.34 / Unipolar 9.09 / LAT -10.0" "Bipolar 1.974 / Unipolar 9.219 / LAT -11.0" "Bipolar 1.938 / Unipolar 10.572 / LAT -9.0" ...
> str_extract(p, "[0-9].[0-9]+")
[1] "7.827"
Warning message:
In stri_extract_first_regex(string, pattern, opts_regex = opts(pattern)) :
  argument is not an atomic vector; coercing

It extracts the value from the first instance and this got recycled to create the whole column of 7.8它从第一个实例中提取值,然后将其回收以创建 7.8 的整个列


If there are more than one column having the 'Bipolar' we may loop across (modify the transmute to mutate if we want to keep all other columns from the original data)如果有不止一列具有“双极”,我们可能会循环遍历(如果我们想保留原始数据中的across其他列,请修改transmute以进行mutate

df %>% 
  transmute(across(where(~ any(stringr::str_detect(.x, "Bipolar"))), 
   ~ as.numeric(str_extract(.x, "(?<=Bipolar\\s)[0-9]\\.[0-9]+")), 
     .names = "sr_bipol{str_remove(.col, '[.]+')}"))
# A tibble: 20 × 1
   sr_bipol12
        <dbl>
 1       7.83
 2       2.34
 3       1.97
 4       1.94
 5       2.85
 6       2.92
 7       3.05
 8       2.80
 9       3.43
10       2.11
11       2.80
12       1.81
13       1.84
14       3.87
15       1.68
16       2.21
17       2.97
18       3.09
19       2.84
20       3.48

Here is an alternative approach:这是另一种方法:

library(tidyverse)

df %>% 
  select(...12) %>% 
  separate(...12, into="group", sep = "\\/") %>%
  mutate(sr_bipol = parse_number(group),
         group= str_extract(group, '[A-Za-z]+'))

   group   sr_bipol
   <chr>      <dbl>
 1 Bipolar     7.83
 2 Bipolar     2.34
 3 Bipolar     1.97
 4 Bipolar     1.94
 5 Bipolar     2.85
 6 Bipolar     2.92
 7 Bipolar     3.05
 8 Bipolar     2.80
 9 Bipolar     3.43
10 Bipolar     2.11
11 Bipolar     2.80
12 Bipolar     1.81
13 Bipolar     1.84
14 Bipolar     3.87
15 Bipolar     1.68
16 Bipolar     2.21
17 Bipolar     2.97
18 Bipolar     3.09
19 Bipolar     2.84
20 Bipolar     3.48

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM