简体   繁体   English

使用 grep 用 R 中的不同值替换每第 n 次和第 (n+1) 次出现

[英]Use grep to replace every nth and (n+1)th occurrence with different values in R

For quarterly data对于季度数据

> df  
  TIME     GEO  Value
2000Q1 Austria 3864.6   
2000Q2 Austria 3841.3   
2000Q3 Austria 3843.0   
2000Q4 Austria 3847.2   
2001Q1 Austria 3853.5   
2001Q2 Austria 3875.2   
2001Q3 Austria 3886.7  
2001Q4 Austria 3921.9   
2002Q1 Austria 3865.2   
2002Q2 Austria 3872.4  
2002Q3 Austria 3876.0  
2002Q4 Austria 3887.9   
2003Q1 Austria 3938.3   
2003Q2 Austria 3954.5  
2003Q3 Austria 3972.8  
2003Q4 Austria 3971.9  

I'm naively converting quarterly data to monthly with df.mon <- rep(df$Value, each=3) .我天真地使用df.mon <- rep(df$Value, each=3)将季度数据转换为每月数据。 I do the same for df$TIME我对df$TIME也这样做

 df.mon$TIME <- rep(df$TIME, each=3)  

I want to convert these time identifiers to monthly ones so that I can easily use df.mon as a weight on monthly data.我想将这些时间标识符转换为月度标识符,以便我可以轻松地使用df.mon作为月度数据的权重。

So, I have所以我有

  >head(df.mon, n=10)
     GEO  month
  3864.6 2000Q1
  3864.6 2000Q1
  3864.6 2000Q1
  3841.3 2000Q2
  3841.3 2000Q2
  3841.3 2000Q2
  3843.0 2000Q3
  3843.0 2000Q3
  3843.0 2000Q3
  3847.2 2000Q4

I want replace the 1st, 4th, 7th, etc. occurrence of Q1 with M01 , and the 2nd, 5th, 8th, etc. occurence of Q1 with M02 and so on, to produce:我想更换1号,4号,7号等的发生Q1M01 ,和第二,第五,第八等occurence Q1M02等等,来产生:

     GEO  month
  3864.6 2000M01
  3864.6 2000M02
  3864.6 2000M03
  3841.3 2000M04
  3841.3 2000M05
  3841.3 2000M06
  3843.0 2000M07

The closest explanation to this is here , and seems like using grep with back references \\1 is the way to go (a useful list is here ).对此最接近的解释是here ,似乎使用带有反向引用\\1 grep是要走的路(这里有一个有用的列表)。

I've tried,我试过了,

gsub("(?:Q1)", "\\1M01\\2M02\\3M03", df.mon$month)

which only gives me这只会给我

     2000M01M02M03
     2000M01M02M03
     2000M01M02M03

I've tried other specifications like gsub("(?:Q1)(?:Q1)(?:Q1)", "\\\\1M01\\\\2M02\\\\3M03", df.mon$month) , for which no replacements are made.我尝试过其他规范,如gsub("(?:Q1)(?:Q1)(?:Q1)", "\\\\1M01\\\\2M02\\\\3M03", df.mon$month) ,没有替代品制作。

I don't really understand what's going on with the (?: ) command (and it seems unnecessary), and I don't know Perl so I'm at a loss for how to make this seemingly easy replacement work.我真的不明白(?: ) :) 命令发生了什么(而且似乎没有必要),而且我不知道 Perl,所以我对如何使这个看似简单的替换工作感到茫然。

Try尝试

year <- grep("[0-9]{4}", df.mon$month, value=T)
month <- paste("M", 1:12, sep="")
yearmonth <- paste(year, month, sep="")
df.mon$month <- yearmonth

No complicated reg expressions needed.不需要复杂的 reg 表达式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM