简体   繁体   English

将向量列表与所需的字符串匹配进行比较-如果不匹配,则删除DF行R

[英]Compare list of Vector with Desired String Matches - If Not a Match Delete DF Row R

I'm interested in creating a vector containing keywords that are the correct titles for what is contained in the category column of a separate data frame. 我对创建一个包含关键字的向量感兴趣,这些关键字是与单独数据框的类别列中所含内容正确的标题。 I'd like to be able to create a function to compare down the category column for matches against the keyword vector, and if they're not there, deleting the incorrect row from the data frame. 我希望能够创建一个函数,以比较类别列与关键字向量的匹配项,如果不存在,则从数据框中删除不正确的行。

Here is an example of desired keywords: 以下是所需关键字的示例:

"Current SharePrice","Current NAV", "Current Premium/Discount", "52WkAvg SharePrice", "52WkAvg NAV", "52WkHigh Premium/Discount" etc etc. “当前股价”,“当前资产净值”,“当前溢价/折扣”,“ 52WkAvg股价”,“ 52WkAvg资产净值”,“ 52Wk高溢价/折扣”等。

I'm trying to remove a few edge cases from a large table where the cleaning produced results such as: 我正在尝试从一张大桌子上除去一些边缘情况,在该情况下,清洁产生的结果如下:

"52WkLow NAV 52wLow" “ 52WkLow NAV 52wLow”

This occurs due to missing data. 发生这种情况是由于缺少数据。 Additionally, as a redundancy check, printing or storing which full rows were removed in the cleaning would be hugely helpful. 另外,作为冗余检查,打印或存储在清洗中删除了整行的记录将非常有帮助。

Using dplyr : 使用dplyr

filter(df, category %in% keywords)

(and the lines removed:) (和删除的行:)

filter(df, !(category %in% keywords))

Base: 基础:

df[df$category %in% keywords,]

removed lines: 删除的行:

df[!(df$category %in% keywords),]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM