在 R 中将字符串从一列提取到另一列

Question

I have an example data frame like the one below.我有一个示例数据框，如下所示。

ID ID	File文件
1 1	11_213.csv 11_213.csv
2 2	13_256.csv 13_256.csv
3 3	11_223.csv 11_223.csv
4 4	12_389.csv 12_389.csv
5 5	14_456.csv 14_456.csv
6 6	12_345.csv 12_345.csv

And I want to add another column based on the string between the underscore and the period to get a data frame that looks something like this.我想根据下划线和句点之间的字符串添加另一列，以获得看起来像这样的数据框。

ID ID	File文件	Group团体
1 1	11_213.csv 11_213.csv	213 213
2 2	13_256.csv 13_256.csv	256 256
3 3	11_223.csv 11_223.csv	223 223
4 4	12_389.csv 12_389.csv	389 389
5 5	14_456.csv 14_456.csv	456 456
6 6	12_345.csv 12_345.csv	345 345

I think I need to use the str_extract feature within stringr but I am not sure what notation to use for my pattern.我想我需要在 stringr 中使用 str_extract 功能，但我不确定要为我的模式使用什么符号。 For example when I use:例如，当我使用：

df <- df %>%
mutate("Group" = str_extract(File, "[^_]+"))

I get the all the information before the underscore like this:我得到下划线之前的所有信息，如下所示：

ID ID	File文件	Group团体
1 1	11_213.csv 11_213.csv	11 11
2 2	13_256.csv 13_256.csv	13 13
3 3	11_223.csv 11_223.csv	11 11
4 4	12_389.csv 12_389.csv	12 12
5 5	14_456.csv 14_456.csv	14 14
6 6	12_345.csv 12_345.csv	12 12

But that is not what I want.但这不是我想要的。 What should I use instead of "[^_]+" to get just the stuff between the underscore and the period?我应该使用什么来代替“[^_]+”来获取下划线和句点之间的内容？ Thanks!谢谢！

Answer 1

We can use a regex lookaround to extract the digits ( \\d+ ) that succeeds a _ and precedes a .我们可以使用正则表达式环视来提取_和 a 之前的数字（ \\d+ ） . with str_extract使用str_extract

library(dplyr)
library(stringr)
df <- df %>%
    mutate(Group = str_extract(File, "(?<=_)(\\d+)(?=\\.)")

Or another option is to remove the substring with str_remove ie to match characters ( .* ) including the _ or ( | ) characters from .或者另一种选择是使用 str_remove 删除str_remove即匹配字符 ( .* )，包括_或 ( | ) 字符. onwards ( . can match any character in regex mode - which is by default, so we escape \\ it for literal matching)之后（ .可以匹配正则表达式模式下的任何字符 - 这是默认情况下，所以我们转义\\它以进行文字匹配）

df <- df %>%
        mutate(Group = str_remove_all(File, ".*_|\\..*"))

Answer 2

A base R option using gsub使用gsub的基本 R 选项

transform(
  df,
  Group = gsub(".*_(\\d+)\\..*", "\\1", File)
)

gives给

  ID       File Group
1  1 11_213.csv   213
2  2 13_256.csv   256
3  3 11_223.csv   223
4  4 12_389.csv   389
5  5 14_456.csv   456
6  6 12_345.csv   345

在 R 中将字符串从一列提取到另一列

问题描述

2 个解决方案

解决方案1
4 已采纳 2021-03-08 17:21:16

解决方案2
2 2021-03-08 22:13:29

在 R 中将字符串从一列提取到另一列

问题描述

2 个解决方案

解决方案1 4 已采纳 2021-03-08 17:21:16

解决方案2 2 2021-03-08 22:13:29

解决方案1
4 已采纳 2021-03-08 17:21:16

解决方案2
2 2021-03-08 22:13:29