简体   繁体   English

如何从r的数据帧内的列表中提取数字信息?

[英]how to extract the numeric information from a list within a dataframe in r?

I have the following type of entries in the first column of a data frame called dfgModsPepFiltered_subset : 我在名为dfgModsPepFiltered_subset的数据帧的第一列中具有以下类型的条目:

A640-P641 = 456.123x

Trying to extract the numeric information from this with the following R script: 尝试使用以下R脚本从中提取数字信息:

dfgModsPepFiltered_subset$AA <- regmatches(dfgModsPepFiltered_subset$Peptide,
        gregexpr("[[:digit:]]+", dfgModsPepFiltered_subset$Peptide))

Gives me: 给我:

c("640", "641", "453", "123")

However, what I really need is a new column for each of "640" , "641" and "456.123" . 但是,我真正需要的是为"640""641""456.123"每一个新的列。

I've tried various combinations of unlisting but can't seem to get the format right. 我尝试了各种不公开的组合,但似乎无法正确显示格式。

You could modify the regmatches 您可以修改regmatches

 as.data.frame(do.call(`rbind`,
         lapply(regmatches(dfgModsPepFiltered_subset$Peptide,
             gregexpr("[[:digit:].]+", dfgModsPepFiltered_subset$Peptide)), 
                                                        as.numeric))

  #   V1  V2      V3
  #1 640 641 456.123
  #2 620 625 285.400

Or using extract from tidyr 或使用tidyr extract

library(tidyr)
res <-  extract(dfgModsPepFiltered_subset, Peptide, c('Col1', 'Col2', 'Col3'),
               '[A-Z](\\d+)-[A-Z](\\d+) += +(\\d+\\.\\d+).+', convert=TRUE) 


res
#  Col1 Col2    Col3
#1  640  641 456.123
#2  620  625 285.400

Or you could use the regex 或者你可以使用regex

extract(dfgModsPepFiltered_subset, Peptide, c('Col1', 'Col2', 'Col3'),
        '[^0-9]+([0-9]+)[^0-9]+([0-9]+)[^0-9]+([0-9.]+)[^0-9]+')

Or 要么

library(splitstackshape)
res1 <-  cSplit(dfgModsPepFiltered_subset, 'Peptide', '[^0-9.]', fixed=FALSE)
res1[,names(res1)[!colSums(is.na(res1))], with=FALSE]
#   Peptide_2 Peptide_4 Peptide_7
#1:       640       641   456.123
#2:       620       625   285.400

Or using strsplit 或使用strsplit

 as.data.frame(t(sapply(strsplit(dfgModsPepFiltered_subset$Peptide,
                       '[^0-9.]'), function(x) na.omit(as.numeric(x)))))

 #   V1  V2      V3
 #1 640 641 456.123
 #2 620 625 285.400

data 数据

dfgModsPepFiltered_subset <- data.frame(Peptide= c('A640-P641 = 456.123x',
       'A620-B625 = 285.400x'), stringsAsFactors=FALSE)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM