简体   繁体   English

提取R中某列特殊字符之间的信息

[英]Extracting information between special characters in a column in R

I'm sorry because I feel like versions of this question have been asked many times, but I simply cannot find code from other examples that works in this case.很抱歉,因为我觉得这个问题的版本已被问过很多次,但我根本无法从其他示例中找到适用于这种情况的代码。 I have a column where all the information I want is stored in between two sets of "%%", and I want to extract this information between the two sets of parentheses and put it into a new column, in this case called df$empty.我有一个列,我想要的所有信息都存储在两组“%%”之间,我想在两组括号之间提取这些信息并将其放入一个新列中,在本例中称为 df$empty .

This is a long column, but in all cases I just want the information between the sets of parentheses.这是一个很长的专栏,但在所有情况下,我只需要括号之间的信息。 Is there a way to code this out across the whole column?有没有办法在整个专栏中对此进行编码?

To be specific, I want in this example a new column that will look like "information", "wanted".具体来说,我希望在此示例中有一个看起来像“信息”、“通缉令”的新列。


empty <- c('NA', 'NA')
information <- c('notimportant%%information%%morenotimportant', 'ignorethis%%wanted%%notthiseither')

df <- data.frame(information, empty)

In this case you can do:在这种情况下你可以这样做:

df$empty <- sapply(strsplit(df$information, '%%'), '[', 2)

#                                   information       empty
# 1 notimportant%%information%%morenotimportant information
# 2           ignorethis%%wanted%%notthiseither      wanted

That is, split the text by '%%' and take second elements of the resulting vectors.也就是说,将文本按'%%'拆分,并获取结果向量的第二个元素。

Or you can get the same result using sub() :或者您可以使用sub()获得相同的结果:

df$empty <- sub('.*%%(.+)%%.*', '\\1', df$information)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM