簡體   English   中英

R 代碼:如何在字符串匹配后過濾 X 數量的元素?

[英]R code: How can I filter X amount of elements after a string match?

我有一個字符串,其中包含從 pdf 中提取的多個元素。 我只想在字符串匹配后包含 5 個元素。 所以我有

c("Retail","Channel1","Discount","10/1/2019 20%","10/1/2020 20%","10/1/2021 20%",
  "Fee", "Channel1", "10/1/2019 $5","10/1/2020 5%","10/1/2021 5%"
  "Supply Chain", "Channel1","Discount", "10/1/2019 80%","10/1/2020 80%","10/1/2021 80%")

我想檢測“零售”,然后最多包含第一個“2021 年 10 月 1 日 20%”

然后我想檢測“費用”並包含最多“10/1/2021 5%”然后“供應鏈”並包含最多“10/1/2021/80%”

零售、費用和供應鏈將始終相同,但日期/百分比一直在變化。

使用tidyverse

v1 <- c("Retail", "Channel1", "Discount", "10/1/2019 20%", "10/1/2020 20%", 
"10/1/2021 20%", "Fee", "Channel1", "10/1/2019 $5", "10/1/2020 5%", 
"10/1/2021 5%", "Supply Chain", "Channel1", "Discount", "10/1/2019 80%", 
"10/1/2020 80%", "10/1/2021 80%")

這里我們使用greplcumsum為每個字符串匹配創建一個分組變量。 然后我們 select 前 5 行。

library(tidyverse)

data.frame(v1) %>% 
  mutate(tag = cumsum(grepl("Retail|Fee|Supply Chain", v1))) %>% 
  group_by(tag) %>% 
  top_n(5)

    Selecting by tag
# A tibble: 17 x 2
# Groups:   tag [3]
   v1              tag
   <fct>         <int>
 1 Retail            1
 2 Channel1          1
 3 Discount          1
 4 10/1/2019 20%     1
 5 10/1/2020 20%     1
 6 10/1/2021 20%     1
 7 Fee               2
 8 Channel1          2
 9 10/1/2019 $5      2
10 10/1/2020 5%      2
11 10/1/2021 5%      2
12 Supply Chain      3
13 Channel1          3
14 Discount          3
15 10/1/2019 80%     3
16 10/1/2020 80%     3
17 10/1/2021 80%     3

這是一個帶有base R的選項

lapply(tapply(v1, cumsum(v1 %in%  c("Retail", "Fee", "Supply Chain")),
        head, 6), tail, -1)
#$`1`
#[1] "Channel1"      "Discount"      "10/1/2019 20%" "10/1/2020 20%" "10/1/2021 20%"

#$`2`
#[1] "Channel1"     "10/1/2019 $5" "10/1/2020 5%" "10/1/2021 5%"

#$`3`
#[1] "Channel1"      "Discount"      "10/1/2019 80%" "10/1/2020 80%" "10/1/2021 80%"

如果這還需要包括“零售”、“費用”、“供應鏈”

tapply(v1, cumsum(v1 %in%  c("Retail", "Fee", "Supply Chain")), head, 6)

數據

v1 <- c("Retail", "Channel1", "Discount", "10/1/2019 20%", "10/1/2020 20%", 
"10/1/2021 20%", "Fee", "Channel1", "10/1/2019 $5", "10/1/2020 5%", 
"10/1/2021 5%", "Supply Chain", "Channel1", "Discount", "10/1/2019 80%", 
"10/1/2020 80%", "10/1/2021 80%")

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM