简体   繁体   中英

Using tidyverse/dplyr to create columns from other column substring

Say we have this data frame in R:

start = data.frame(
  Title = c("name_year0","name_year1","name_year2"),
  value = c(4,5,6)
)

I would like to mutate it such that the year information from Title is present in a year column:

       Title value  year
        name     4     0
        name     5     1
        name     6     2

This code almost works:

result1 = test %>% 
  mutate(year = str_match(Title, "year[0-9]+"))

But results in this, which keeps the string name in the year column:

       Title value  year
  name_year0     4 year0
  name_year1     5 year1
  name_year2     6 year2

It seems that I should be able to use groups in the regex match to pull out just the number part from year , like so:

result2 = test %>% 
  mutate(year = str_match(Title, "year([0-9]+)")[1,2])

But for some reason, that seems to always return the same year value:

       Title value year
  name_year0     4    0
  name_year1     5    0
  name_year2     6    0

What (likely simple) thing am I missing? Why does str_match("name_year0","year([0-9]+)")[2] work for the single string, but it doesn't when I put it in mutate ?

Thanks

I guess you have had a typo. In str_match(Title, "year([0-9]+)")[1,2] , [1,2] returns the value of row=1, column=2. To get column 2, use [ , 2] instead, or just [2] to indicate column 2 like you have mentioned in your response of comments.

start = data.frame(
  Title = c("name_year0","name_year1","name_year2"),
  value = c(4,5,6)
)

start %>% 
  mutate(year = str_match(Title, "year([0-9]+)")[,2])

Edits: Sorry, I made a mistake. Str_match returns a matrix here. and matrix is like a vector (column-wise). [2] is the second value in the matrix and [20] is the 20th value from the top left one, column first, as shown in this example.

> a=matrix(1:100, ncol=10)

> a
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]    1   11   21   31   41   51   61   71   81    91
 [2,]    2   12   22   32   42   52   62   72   82    92
 [3,]    3   13   23   33   43   53   63   73   83    93
 [4,]    4   14   24   34   44   54   64   74   84    94
 [5,]    5   15   25   35   45   55   65   75   85    95
 [6,]    6   16   26   36   46   56   66   76   86    96
 [7,]    7   17   27   37   47   57   67   77   87    97
 [8,]    8   18   28   38   48   58   68   78   88    98
 [9,]    9   19   29   39   49   59   69   79   89    99
[10,]   10   20   30   40   50   60   70   80   90   100

> a[2]
[1] 2

> a[20]
[1] 20

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM