简体   繁体   English

正则表达式替换R中的某些文本

[英]Regular Expression to replace certain text in R

I am working data.csv file and I need to process certain pattern of data. 我正在使用data.csv文件,我需要处理某些数据模式。 Currently, class colum in my data.csv file look like: 目前,我的data.csv文件中的类colum如下所示:

org.apache.camel.bam.TimeExpression.evaluate(TimeExpression.java     
org.apache.camel.bam.rules.TemporalRule.processExchange(TemporalRule.java    
org.apache.camel.bam.rules.ActivityRules.processExchange(ActivityRules.java      
org.apache.camel.bam.rules.ProcessRules.processExchange(ProcessRules.java 
org.apache.camel.bam.processor.JpaBamProcessor.processEntity(JpaBamProcessor.java    
org.apache.camel.bam.processor.JpaBamProcessor.processEntity(JpaBamProcessor.java 

Now, I need to replace text appearing before bracket "(" with text ".java". In this case, my desired output is suppose to be: 现在,我需要替换括号“(”之前出现的文本“.java”。在这种情况下,我想要的输出是:

org.apache.camel.bam.TimeExpression.java     
org.apache.camel.bam.rules.TemporalRule.java     
org.apache.camel.bam.rules.ActivityRules.java    
org.apache.camel.bam.rules.ProcessRules.java
org.apache.camel.bam.processor.JpaBamProcessor.java      
org.apache.camel.bam.processor.JpaBamProcessor.java 

Currently, I am trying following code: 目前,我正在尝试以下代码:

dscls<-gsub("\\.[^.]+($", "java", data$class)

So, basically, I am trying to find the text untill "(" and then replacing it with text ".java". But, It does not produce correct output. Can some one help me to sort out regular expression correctly? 所以,基本上,我试图找到文本直到“(”然后用文本“.java”替换它。但是,它没有产生正确的输出。有人能帮我正确理清正则表达式吗?

We can use sub to match the word ( \\\\w+ ) followed by ( followed by another word ( \\\\w+ ) and a dot ( \\\\. ), replace it with blank ( "" ). 我们可以使用sub来匹配单词( \\\\w+ )后跟(后跟另一个单词( \\\\w+ )和一个点( \\\\. )),将其替换为空白( "" )。

sub("\\w+\\(\\w+\\.", "", data$class)
#[1] "org.apache.camel.bam.TimeExpression.java"  
#[2] "org.apache.camel.bam.rules.TemporalRule.java"
#[3] "org.apache.camel.bam.rules.ActivityRules.java"      
#[4] "org.apache.camel.bam.rules.ProcessRules.java"        
#[5] "org.apache.camel.bam.processor.JpaBamProcessor.java" 
#[6]"org.apache.camel.bam.processor.JpaBamProcessor.java"

data 数据

 data <- structure(list(class = 
 c("org.apache.camel.bam.TimeExpression.evaluate(TimeExpression.java", 
"org.apache.camel.bam.rules.TemporalRule.processExchange(TemporalRule.java", 
"org.apache.camel.bam.rules.ActivityRules.processExchange(ActivityRules.java", 
"org.apache.camel.bam.rules.ProcessRules.processExchange(ProcessRules.java", 
"org.apache.camel.bam.processor.JpaBamProcessor.processEntity(JpaBamProcessor.java", 
"org.apache.camel.bam.processor.JpaBamProcessor.processEntity(JpaBamProcessor.java"
)), .Names = "class", row.names = c(NA, -6L), class = "data.frame")

here df$x has the data you shared 这里df $ x包含您共享的数据

gsub("\\w+\\(.*", "java", df$x)
[1] "org.apache.camel.bam.TimeExpression.java"           "org.apache.camel.bam.rules.TemporalRule.java"       
[3] "org.apache.camel.bam.rules.ActivityRules.java"       "org.apache.camel.bam.rules.ProcessRules.java"       
[5] "org.apache.camel.bam.processor.JpaBamProcessor.java" "org.apache.camel.bam.processor.JpaBamProcessor.java"

Since you already have the strings ending in .java (in example at least), you can try this too: 既然你已经有了以.java结尾的字符串(至少在例子中),你也可以试试这个:

strs <- c('org.apache.camel.bam.TimeExpression.evaluate(TimeExpression.java','org.apache.camel.bam.rules.TemporalRule.processExchange(TemporalRule.java','org.apache.camel.bam.rules.ActivityRules.processExchange(ActivityRules.java','org.apache.camel.bam.rules.ProcessRules.processExchange(ProcessRules.java','org.apache.camel.bam.processor.JpaBamProcessor.processEntity(JpaBamProcessor.java','org.apache.camel.bam.processor.JpaBamProcessor.processEntity(JpaBamProcessor.java')

gsub('\\.\\w+\\(\\w+(\\.java)', '\\1', strs)

#[1] "org.apache.camel.bam.TimeExpression.java"           
#[2] "org.apache.camel.bam.rules.TemporalRule.java"       
#[3] "org.apache.camel.bam.rules.ActivityRules.java"      
#[4] "org.apache.camel.bam.rules.ProcessRules.java"       
#[5] "org.apache.camel.bam.processor.JpaBamProcessor.java"
#[6] "org.apache.camel.bam.processor.JpaBamProcessor.java"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM