[英]How do I create a row when grep doesn't find a match?
在堆栈溢出问题上潜伏了十年之后,我终于迫不及待地寻求帮助了! 对任何错误表示歉意!
我正在从 word 中提取表格以创建我自己的数据框。 大约有 50 份文件,都有同一张表,但数据不是我的,说得客气一点,有点乱。 该表是 2 列(名称,值)乘 60 行,df$Name 内容经常写错,或者行一起丢失。 这不是我的数据,因此无法编辑它。
我的问题是 - 我想将每个单词文档的数据绑定在一起,因此它们需要具有相同的列。 我将转置数据,因此名称变为 header,值变为第 1 行。由于 df$Name 内容混乱,我使用 grep 提取我想要的那些行。 (之前我尝试提取行号,但行号在 word 文档之间发生了变化)
这些是 df$Name 中应该存在的所有值。
Col <- c("Top Film / Web code (if applicable)", "Base Film / Web code (if applicable)", "Top Label / Sleeves code", "Base Label code", "Promotional Label code", "Trays Code", "SRP code", "SRP label code", "Packing format (overwrap, MAP, VAC)", "Vac pressure (if applicable)", "Die set", "Optimal running speed (max)", "Gas mix (if applicable)", "Pressure for Leaker checks (bar)", "Frequency of checks", "Metal Detection Limits", "No. of Units per pack","Pack weight", "Claims", "Shelf Life Of Product From Pack / Slice", "Date code format", "Health Mark", "UK & EU Address","“e” mark present", "Weight present", "Top Label Placement", "Base Label Placement", "Promo Label Placement", "Barcode (if applicable)", "No. of Packs per SRP/Basket","Weight of outercase", "Max No. of SRP/Baskets per pallet")
##use grep to get R to search for similar words present in all word docs################
toMatch <- c("Top Film","Base Film", "Top Label", "Base Label", "Promotional", "Trays", "SRP","Packing format", "Vac pressure", "Die set", "Optimal running speed", "Gas mix",
"Pressure", "Frequency", "Metal Detection Limits", "per pack",
"Pack weight",
"Claims", "Shelf Life", "Date code", "Health", "Address",
"“e”", "Weight present", "Top Label Place", "Base Label Place", "Promo Label Place", "Barcode","No. of Packs","outercase", "Max No.")
tab_select <- unique (df[grep(paste(toMatch,collapse="|"),
df$Name, ignore.case=TRUE),])
像这样使用 grep 非常成功 - 但如果缺少一个值,则没有它的迹象 - 所以在这种情况下“托盘代码”不存在 - 但我需要一个空白的“托盘代码”(值中有 NA)创建。 添加一个没有帮助,因为它位于表格的底部,我需要它们保持正确的顺序。
有没有办法让 grep 匹配,但如果没有匹配,还用 NA 创建一行?
我尝试用正确的列名制作一个单独的表 - 使用 dplyr 加入,希望任何重复项都会消失,但 df$Name 和 Col 中的名称略有不同意味着重复项更多。
我不确定我是否应该遍历每个模式并在没有太多模式的情况下创建一行 - 我只是担心在循环中循环循环,这可能会发生。 ATM,这个 grep 公式使用了多种模式,有些模式会提取多行数据,这可能会使事情复杂化。
这个怎么样:
df <- data.frame(Name = c("Top Film / Web code (if applicable)", "Base Film / Web code (if applicable)", "Top Label / Sleeves code", "Base Label code", "Promotional Label code", "Trays Code", "SRP code", "SRP label code", "Packing format (overwrap, MAP, VAC)", "Vac pressure (if applicable)", "Die set", "Optimal running speed (max)", "Gas mix (if applicable)", "Pressure for Leaker checks (bar)", "Frequency of checks", "Metal Detection Limits", "No. of Units per pack","Pack weight", "Claims", "Shelf Life Of Product From Pack / Slice", "Date code format", "Health Mark", "UK & EU Address","“e” mark present", "Weight present", "Top Label Placement", "Base Label Placement", "Promo Label Placement", "Barcode (if applicable)", "No. of Packs per SRP/Basket","Weight of outercase", "Max No. of SRP/Baskets per pallet"))
toMatch <- c("Top Film","Base Film", "Top Label", "Base Label", "Promotional", "Trays", "SRP","Packing format", "Vac pressure", "Die set", "Optimal running speed", "Gas mix", "Pressure", "Frequency", "Metal Detection Limits", "per pack", "Pack weight", "Claims", "Shelf Life", "Date code", "Health", "Address", "“e”", "Weight present", "Top Label Place", "Base Label Place", "Promo Label Place", "Barcode","No. of Packs","outercase", "Max No.")
df$Value <- 1:nrow(df)
df$Name[6] <- "Not Matched"
out <- lapply(toMatch, function(x){
if(any(grepl(x, df$Name))){
df[grep(x, df$Name), ]
}else{
data.frame(Name = x, Value=NA)
}
})
out <- do.call(rbind, out)
head(out, n=10)
#> Name Value
#> 1 Top Film / Web code (if applicable) 1
#> 2 Base Film / Web code (if applicable) 2
#> 3 Top Label / Sleeves code 3
#> 26 Top Label Placement 26
#> 4 Base Label code 4
#> 27 Base Label Placement 27
#> 5 Promotional Label code 5
#> 110 Trays NA
#> 7 SRP code 7
#> 8 SRP label code 8
由reprex package (v2.0.1) 创建于 2023-01-08
请注意,我将df
中Name
的第六个观察结果更改为"Not Matched"
,以显示没有匹配时会发生什么。 它与原始数据中的"Trays"
匹配。 您可以在 output 的第 6 行看到没有匹配的情况。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.