[英]Regular Expression to replace certain text in R
I am working data.csv file and I need to process certain pattern of data. 我正在使用data.csv文件,我需要处理某些数据模式。 Currently, class colum in my data.csv file look like: 目前,我的data.csv文件中的类colum如下所示:
org.apache.camel.bam.TimeExpression.evaluate(TimeExpression.java
org.apache.camel.bam.rules.TemporalRule.processExchange(TemporalRule.java
org.apache.camel.bam.rules.ActivityRules.processExchange(ActivityRules.java
org.apache.camel.bam.rules.ProcessRules.processExchange(ProcessRules.java
org.apache.camel.bam.processor.JpaBamProcessor.processEntity(JpaBamProcessor.java
org.apache.camel.bam.processor.JpaBamProcessor.processEntity(JpaBamProcessor.java
Now, I need to replace text appearing before bracket "(" with text ".java". In this case, my desired output is suppose to be: 现在,我需要替换括号“(”之前出现的文本“.java”。在这种情况下,我想要的输出是:
org.apache.camel.bam.TimeExpression.java
org.apache.camel.bam.rules.TemporalRule.java
org.apache.camel.bam.rules.ActivityRules.java
org.apache.camel.bam.rules.ProcessRules.java
org.apache.camel.bam.processor.JpaBamProcessor.java
org.apache.camel.bam.processor.JpaBamProcessor.java
Currently, I am trying following code: 目前,我正在尝试以下代码:
dscls<-gsub("\\.[^.]+($", "java", data$class)
So, basically, I am trying to find the text untill "(" and then replacing it with text ".java". But, It does not produce correct output. Can some one help me to sort out regular expression correctly? 所以,基本上,我试图找到文本直到“(”然后用文本“.java”替换它。但是,它没有产生正确的输出。有人能帮我正确理清正则表达式吗?
We can use sub
to match the word ( \\\\w+
) followed by (
followed by another word ( \\\\w+
) and a dot ( \\\\.
), replace it with blank ( ""
). 我们可以使用sub
来匹配单词( \\\\w+
)后跟(
后跟另一个单词( \\\\w+
)和一个点( \\\\.
)),将其替换为空白( ""
)。
sub("\\w+\\(\\w+\\.", "", data$class)
#[1] "org.apache.camel.bam.TimeExpression.java"
#[2] "org.apache.camel.bam.rules.TemporalRule.java"
#[3] "org.apache.camel.bam.rules.ActivityRules.java"
#[4] "org.apache.camel.bam.rules.ProcessRules.java"
#[5] "org.apache.camel.bam.processor.JpaBamProcessor.java"
#[6]"org.apache.camel.bam.processor.JpaBamProcessor.java"
data <- structure(list(class =
c("org.apache.camel.bam.TimeExpression.evaluate(TimeExpression.java",
"org.apache.camel.bam.rules.TemporalRule.processExchange(TemporalRule.java",
"org.apache.camel.bam.rules.ActivityRules.processExchange(ActivityRules.java",
"org.apache.camel.bam.rules.ProcessRules.processExchange(ProcessRules.java",
"org.apache.camel.bam.processor.JpaBamProcessor.processEntity(JpaBamProcessor.java",
"org.apache.camel.bam.processor.JpaBamProcessor.processEntity(JpaBamProcessor.java"
)), .Names = "class", row.names = c(NA, -6L), class = "data.frame")
here df$x has the data you shared 这里df $ x包含您共享的数据
gsub("\\w+\\(.*", "java", df$x)
[1] "org.apache.camel.bam.TimeExpression.java" "org.apache.camel.bam.rules.TemporalRule.java"
[3] "org.apache.camel.bam.rules.ActivityRules.java" "org.apache.camel.bam.rules.ProcessRules.java"
[5] "org.apache.camel.bam.processor.JpaBamProcessor.java" "org.apache.camel.bam.processor.JpaBamProcessor.java"
Since you already have the strings ending in .java (in example at least), you can try this too: 既然你已经有了以.java结尾的字符串(至少在例子中),你也可以试试这个:
strs <- c('org.apache.camel.bam.TimeExpression.evaluate(TimeExpression.java','org.apache.camel.bam.rules.TemporalRule.processExchange(TemporalRule.java','org.apache.camel.bam.rules.ActivityRules.processExchange(ActivityRules.java','org.apache.camel.bam.rules.ProcessRules.processExchange(ProcessRules.java','org.apache.camel.bam.processor.JpaBamProcessor.processEntity(JpaBamProcessor.java','org.apache.camel.bam.processor.JpaBamProcessor.processEntity(JpaBamProcessor.java')
gsub('\\.\\w+\\(\\w+(\\.java)', '\\1', strs)
#[1] "org.apache.camel.bam.TimeExpression.java"
#[2] "org.apache.camel.bam.rules.TemporalRule.java"
#[3] "org.apache.camel.bam.rules.ActivityRules.java"
#[4] "org.apache.camel.bam.rules.ProcessRules.java"
#[5] "org.apache.camel.bam.processor.JpaBamProcessor.java"
#[6] "org.apache.camel.bam.processor.JpaBamProcessor.java"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.