正則表達式：提取一個十進制數字，其后為R中的模式

Question

不知道我在做什么錯。 我在文本文件中有行...目標行看起來像這樣

Nsource.Inhibitor 3 81.63 27.21 1.84 0.008

Nsource.Inhibitor 3 90.31 17.21 0.84 <0.001

我想從末尾提取0.008和<0.001。

但是，還有其他幾行表示我們必須將該行的第一部分用作模式的一部分。

Nsource 1 1238.10 1238.10 40.29 <.001

抑制劑3 1484.41 494.80 16.10 <.001

我的嘗試

reline <- "+ Nsource.Inhibitor   3   81.63   27.21   1.84    0.008"
decnum <- "[[:digit:]]+\\.*[[:digit:]]*"
chk <- paste0("+ Nsource.Inhibitor[:blank:]+", decnum, "[:blank:]+", decnum, "[:blank:]+", decnum, "[:blank:]+", decnum,
       "[:blank:]+", "([[:digit:]]+\\.*[[:digit:]]*)")
gsub(chk, "\\1",reline)

返回：

“ + Nsource.Inhibitor \\ t 3 \\ t 81.63 \\ t 27.21 \\ t 1.84 \\ t 0.008”

謝謝你的幫助。

馬特

Answer 1

像這樣嗎

library(stringr)
strings <- c("Nsource.Inhibitor 3 81.63 27.21 1.84 0.008", "Nsource.Inhibitor 3 90.31 17.21 0.84 <0.001", 
             "Nsource 1 1238.10 1238.10 40.29 <.001", "Inhibitor 3 1484.41 494.80 16.10 <.001")

str_match(strings, "(?=^Nsource.Inhibitor).*?(<?\\d+\\.\\d+)$")[,2]

這產生

[1] "0.008"  "<0.001" NA       NA

它確保在字符串的開頭有Nsource.Inhibitor ，然后才匹配該行的最后\\d+.\\d+模式（最終加上< 。）。

Answer 2

如果目標行包含“ Nsource.Inhibitor”，並且最后一個字符是數字，並且您要提取最后一個空格之后的所有字符，請嘗試：

gsub(".*Nsource\\.Inhibitor.*\\s(.*[0-9])$", "\\1", reline)

如果Nsource或Inhibitor沒有大寫字母，則可以添加ignore.case = T 。

例子：

> reline <- "+ Nsource.Inhibitor   3   81.63   27.21   1.84    <0.008"
> output <- gsub(".*Nsource\\.Inhibitor.*\\s(.*[0-9])$", "\\1", reline, ignore.case = T)
> output
[1] "<0.008"

> reline <- "+ Nsource.Inhibitor   3   81.11  27  1232   23  123111  55.5555  0.38"
> output <- gsub(".*Nsource\\.inhibitor.*\\s(.*[0-9])$", "\\1", reline, ignore.case = T)
> output
[1] "0.38"

Answer 3

strings <- c("Nsource.Inhibitor 3 81.63 27.21 1.84 0.008", "Nsource.Inhibitor 3 90.31 17.21 0.84 <0.001",  "Nsource 1 1238.10 1238.10 40.29 <.001", "Inhibitor 3 1484.41 494.80 16.10 <.001")

下面的表達式使用grep拾取包含子字符串'Nsource.Inhibitor'的字符串，用' '分割字符串，並返回每個分割后的字符串的第6個元素。

sapply(strsplit(strings[grep('Nsource.Inhibitor', strings)], ' '), '[[',6)

Answer 4

這里沒有理由使用正則表達式。 只需將文件讀取為data.frame並進行簡單的子設置即可：

DF <- read.table(text = "Nsource.Inhibitor 3 81.63 27.21 1.84 0.008
           Nsource.Inhibitor 3 90.31 17.21 0.84 <0.001
           Nsource 1 1238.10 1238.10 40.29 <.001
           nhibitor 3 1484.41 494.80 16.10 <.001", stringsAsFactors = FALSE) #you can read from file directly

DF[DF$V1 == "Nsource.Inhibitor", ncol(DF)]
#[1] "0.008"  "<0.001"

正則表達式：提取一個十進制數字，其后為R中的模式

問題描述

4 個解決方案

解決方案1
1 2017-10-16 06:22:35

解決方案2
1 2017-10-16 06:23:20

解決方案3
1 2017-10-16 12:52:14

解決方案4
1 2017-10-17 06:25:17

正則表達式：提取一個十進制數字，其后為R中的模式

問題描述

4 個解決方案

解決方案1 1 2017-10-16 06:22:35

解決方案2 1 2017-10-16 06:23:20

解決方案3 1 2017-10-16 12:52:14

解決方案4 1 2017-10-17 06:25:17

解決方案1
1 2017-10-16 06:22:35

解決方案2
1 2017-10-16 06:23:20

解決方案3
1 2017-10-16 12:52:14

解決方案4
1 2017-10-17 06:25:17