Issue with piping stringr str_detect into str_extract - extract is only pulling text from 1st row: argument is not an atomic vector; coercing

Question

I'm trying to create a new column which just contains certain numeric data from an expression.

Here's my data: https://pastebin.com/hYg3zqYz

I just need the numbers that come after Bipolar in column 12.

Here's what works

p <- df %>% 
      select(where(~ any(stringr::str_detect(.x, "Bipolar")))) #returns correct column

Where I try then try to make a new column that pulls just the text, it only ever returns the first row, not sure what I'm doing wrong.

p %>%
      mutate(group = "sr_bipol",
             sr_bipol = as.numeric(stringr::str_extract(., "[0-9].[0-9]+"))) %>% 
       select(group, sr_bipol)

# A tibble: 20 × 2
   group    sr_bipol
   <chr>       <dbl>
 1 sr_bipol     7.83
 2 sr_bipol     7.83
 3 sr_bipol     7.83
 4 sr_bipol     7.83
 5 sr_bipol     7.83
.....................

I also get the error code:

 argument is not an atomic vector; coercing

Answer 1

The . refers to the whole dataset ( str_extract needs a vector as input and not a data.frame). According to ?str_extract

string - Input vector. Either a character vector, or something coercible to one.

We may need to apply str_extract on the column 12. As the column name for 12 prefix include ... that are unusual column names, use backticks to access the column values

library(dplyr)
library(stringr)
df %>% 
  transmute(group = 'sr_bipol', 
    sr_bipol = as.numeric(str_extract(`...12`, "(?<=Bipolar\\s)[0-9]\\.[0-9]+")))

-output

# A tibble: 20 × 2
   group    sr_bipol
   <chr>       <dbl>
 1 sr_bipol     7.83
 2 sr_bipol     2.34
 3 sr_bipol     1.97
 4 sr_bipol     1.94
 5 sr_bipol     2.85
 6 sr_bipol     2.92
 7 sr_bipol     3.05
 8 sr_bipol     2.80
 9 sr_bipol     3.43
10 sr_bipol     2.11
11 sr_bipol     2.80
12 sr_bipol     1.81
13 sr_bipol     1.84
14 sr_bipol     3.87
15 sr_bipol     1.68
16 sr_bipol     2.21
17 sr_bipol     2.97
18 sr_bipol     3.09
19 sr_bipol     2.84
20 sr_bipol     3.48

The p data is a single column tibble/data.frame . When we use . , it selects the data.frame as such ie

> str(p)
tibble [20 × 1] (S3: tbl_df/tbl/data.frame)
 $ ...12: chr [1:20] "Bipolar 7.827 / Unipolar 16.911 / LAT -9.0" "Bipolar 2.34 / Unipolar 9.09 / LAT -10.0" "Bipolar 1.974 / Unipolar 9.219 / LAT -11.0" "Bipolar 1.938 / Unipolar 10.572 / LAT -9.0" ...
> str_extract(p, "[0-9].[0-9]+")
[1] "7.827"
Warning message:
In stri_extract_first_regex(string, pattern, opts_regex = opts(pattern)) :
  argument is not an atomic vector; coercing

It extracts the value from the first instance and this got recycled to create the whole column of 7.8

If there are more than one column having the 'Bipolar' we may loop across (modify the transmute to mutate if we want to keep all other columns from the original data)

df %>% 
  transmute(across(where(~ any(stringr::str_detect(.x, "Bipolar"))), 
   ~ as.numeric(str_extract(.x, "(?<=Bipolar\\s)[0-9]\\.[0-9]+")), 
     .names = "sr_bipol{str_remove(.col, '[.]+')}"))
# A tibble: 20 × 1
   sr_bipol12
        <dbl>
 1       7.83
 2       2.34
 3       1.97
 4       1.94
 5       2.85
 6       2.92
 7       3.05
 8       2.80
 9       3.43
10       2.11
11       2.80
12       1.81
13       1.84
14       3.87
15       1.68
16       2.21
17       2.97
18       3.09
19       2.84
20       3.48

Answer 2

Here is an alternative approach:

library(tidyverse)

df %>% 
  select(...12) %>% 
  separate(...12, into="group", sep = "\\/") %>%
  mutate(sr_bipol = parse_number(group),
         group= str_extract(group, '[A-Za-z]+'))

   group   sr_bipol
   <chr>      <dbl>
 1 Bipolar     7.83
 2 Bipolar     2.34
 3 Bipolar     1.97
 4 Bipolar     1.94
 5 Bipolar     2.85
 6 Bipolar     2.92
 7 Bipolar     3.05
 8 Bipolar     2.80
 9 Bipolar     3.43
10 Bipolar     2.11
11 Bipolar     2.80
12 Bipolar     1.81
13 Bipolar     1.84
14 Bipolar     3.87
15 Bipolar     1.68
16 Bipolar     2.21
17 Bipolar     2.97
18 Bipolar     3.09
19 Bipolar     2.84
20 Bipolar     3.48

Issue with piping stringr str_detect into str_extract - extract is only pulling text from 1st row: argument is not an atomic vector; coercing

Question

2 answers

solution1
3 ACCPTED 2022-08-08 15:50:39

solution2
1 2022-08-08 16:13:18

Issue with piping stringr str_detect into str_extract - extract is only pulling text from 1st row: argument is not an atomic vector; coercing

Question

2 answers

solution1 3 ACCPTED 2022-08-08 15:50:39

solution2 1 2022-08-08 16:13:18

solution1
3 ACCPTED 2022-08-08 15:50:39

solution2
1 2022-08-08 16:13:18