简体   繁体   English

如何提取化学式中带括号的每个元素的数目,并将其创建为列?

[英]How do you extract the number of each element in a chemical formula, with parentheses, and create them into columns?

Hi I'm trying to extract some information from a chemical formula and add them to a pre-existing table on r. 嗨,我正在尝试从化学式中提取一些信息,并将它们添加到r上预先存在的表格中。 Currently I have a column that have chemical formulas as shown (C4H8O2). 目前,我有一列具有化学式所示的列(C4H8O2)。 I have no problem extracting each element and its corresponding number. 我没有问题提取每个元素及其对应的数字。 However I have a problem when brackets are involved in the formula, such as C3[13]C1H8O2. 但是,当公式中包含方括号时,例如C3 [13] C1H8O2,我遇到了问题。 I want the title to say 13[C] and the input be 1. However my code doesn't recognize '[13]C1' so it gives me an error. 我希望标题说13 [C],输入为1。但是我的代码无法识别“ [13] C1”,所以给我一个错误。

Any suggestions would be great. 任何建议都很好。

#First manipuation - extracting information out of the "Composition" column, into seperated columns for each element

data2 <- dataframe%>%mutate(Composition=gsub("\\b([A-Za-z]+)\\b","\\11",Composition),
              name=str_extract_all(Composition,"[A-Za-z]+"),
              value=str_extract_all(Composition,"\\d+"))%>%
   unnest()%>%spread(name,value,fill=0)

I already have a pre-made csv file that has the table organized and I made that into a data frame, so now I'm just trying to parce out the elements with the the 'C' column and '[13]C' column and their corresponding number. 我已经有一个预制的csv文件,该文件将表组织在一起,并将其放入数据框,因此现在我只是尝试使用'C'列和'[13] C'列删除元素及其相应的编号。

The following regular expression should extract the isotope number, the element, and the number of atoms. 以下正则表达式应提取同位素数,元素和原子数。

library(stringr)
str_match_all( "C3[13]C1H8O2", "(\\[[0-9]+\\])?([A-Za-z]+)([0-9]+)" )
## [[1]]
##      [,1]     [,2]   [,3] [,4]
## [1,] "C3"     NA     "C"  "3" 
## [2,] "[13]C1" "[13]" "C"  "1" 
## [3,] "H8"     NA     "H"  "8" 
## [4,] "O2"     NA     "O"  "2" 

With a data.frame: 使用data.frame:

library(tidyr)
library(dplyr)
d <- data.frame( Composition = c( "H2O1", "C3[13]C1H8O2" ) )
pattern <- "(\\[[0-9]+\\])?([A-Za-z]+)([0-9]+)"
d %>%
  mutate( Details = lapply( str_match_all( Composition, pattern ), as.data.frame ) ) %>%
  unnest() %>%
  transmute(
    Composition,
    element = paste0( ifelse(is.na(V2),"",V2), V3 ),
    number = V4
  ) %>% 
  spread(key="element", value="number") %>%
  replace(., is.na(.), 0)

##    Composition [13]C C H O
## 1 C3[13]C1H8O2     1 3 8 2
## 2         H2O1     0 0 2 1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从 R 中的化学式中提取数字(缺少 1 的数字) - Extract numbers from Chemical Formula (missing the number of 1) in R 从 R 中的化学式中提取数字 - Extract numbers from Chemical Formula in R 如何解析化学式以获得原子成分? - How to parse chemical formula to get atom component? 如何在R中创建用户定义的公式函数 - How do you create a user defined formula functions in R 如何在 R 中创建一行,每个行元素等于其列中的值数? - How do I create a row in R with each row element equal to the number of values in its column? 如何提取R中包含特定文本/字符串的列 - how do you extract the columns that contain a certain text/string in R 在R中,如果不知道所有数据集中有多少列,如何提取列? - In R, How do you extract columns when you don't know how many columns there are in all datasets? 如何拆分列并从中创建关系? - How do you split a column and create a relation from them? 如何检查某些列是否存在,如果不存在,如何创建它们并用零填充它们? - How do I check to see if certain columns exist and, if not, how to then create them and populate them with zeroes? 如何提取 scclust 对象并将它们附加到 R 中的数据框? - How do you extract scclust objects and append them to a data frame in R?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM