简体   繁体   English

如何计算与字符向量值匹配的观察值

[英]How to count observations matching the values of a vector of characters

I have a dataframe with numerous observations and different type of variables.我有一个dataframe ,其中包含大量观察结果和不同类型的变量。 Here's a sample of my dataframe :这是我的dataframe的示例:

mydf <- structure(list(id = 1:16, Product = c("Pizza", "Cleaning Product", 
"Chocolate", "Fruit", "Red Meat", "Cleaning Product", "Bracelet", 
"Trucker Hat", "Shirt", "Shirt", "Chicken Breast", "Chocolate", 
"Cereal", "Fruit", "Cleaning Product", "Trucker Hat"), price = c(2, 
3.5, 1, 1, 2.5, 3.5, 3, 5, 15, 20, 2.5, 1, 2, 1, 3.5, 4), place = c("Supermarket", 
"Supermarket", "Supermarket", "Little Store", "Supermarket", 
"Supermarket", "Little Store", "Gas Station", "Supermarket", 
"Supermarket", "Little Store", "Gas Station", "Gas Station", 
"Little Store", "Supermarket", "Supermarket")), row.names = c(NA, 
-16L), class = "data.frame")
# of observation # 观察 Product产品 Price in $价格 $ Place地方
1 1个 Pizza比萨 2 2个 Supermarket超级市场
2 2个 Cleaning Product清洁产品 3.5 3.5 Supermarket超级市场
3 3个 Chocolate巧克力 1 1个 Supermarket超级市场
4 4个 Fruit水果 1 1个 Little Store小店
5 5个 Red Meat红肉 2.5 2.5 Supermarket超级市场
6 6个 Cleaning Product清洁产品 3.5 3.5 Supermarket超级市场
7 7 Bracelet手镯 3 3个 Little Store小店
8 8个 Trucker Hat卡车司机帽 5 5个 Gas Station加油站
9 9 Shirt衬衫 15 15 Supermarket超级市场
10 10 Shirt衬衫 20 20 Supermarket超级市场
11 11 Chicken Breast鸡胸肉 2.5 2.5 Little Store小店
12 12 Chocolate巧克力 1 1个 Gas Station加油站
13 13 Cereal谷物 2 2个 Gas Station加油站
14 14 Fruit水果 1 1个 Little Store小店
15 15 Cleaning Product清洁产品 3.5 3.5 Supermarket超级市场
16 16 Trucker Hat卡车司机帽 4 4个 Supermarket超级市场

I also have a vector of characters :我还有一个characters vector

non.food <- c("Cleaning", "Hat", "Shirt", "Bracelet")

I have to eliminate observations that match any of the words from the vector non.food .我必须消除与vector non.food中的任何单词匹配的观察结果。 For this I use the following code:为此,我使用以下代码:

non.food <- paste(c("Cleaning", "Hat", "Shirt", "Bracelet"), collapse = '|') 
mydf <- mydf %>% 
filter(!str_detect(Product,non.food))

It works pretty well but I have the impression that I lose more observations than I should.它工作得很好,但我的印象是我失去了更多的观察结果。 For instance, looking at the sample I should lose 8 observations.例如,查看样本我应该失去 8 个观察值。 But in reality I end up losing 10 (I don't show it in the sample since in reality I have 8916 observations, so the sample is just an example of what kind of dataframe I face)但实际上我最终失去了 10(我没有在样本中显示它,因为实际上我有 8916 个观察结果,所以样本只是我面对什么样的 dataframe 的一个例子)

So, I would like to first count the number of observations that match any of the words inside the vector to be sure that my code didn't eliminate more observations than it should.因此,我想首先计算与vector中的任何单词匹配的观察值的数量,以确保我的code没有消除比它应该消除的更多的观察值。 I cannot use commands as which(mydf$Product == non.food) or sum(mydf$Product == non.food) .我不能将命令用作which(mydf$Product == non.food)sum(mydf$Product == non.food) I could do the inverse of my code and filter only by observations that match my strings of characters to verify, but it takes more time and creates more data that I don't want.我可以执行与我的代码相反的操作,仅通过与我的字符串相匹配的观察结果进行过滤以进行验证,但这会花费更多时间并创建更多我不想要的data Does anybody has an idea?有人有想法吗?

Also, if my code is in fact eliminating more observations than it should, does somebody has a solution?另外,如果我的code实际上消除了比应有的更多的观察结果,有人有解决方案吗?

Thank you in advance.先感谢您。

You could add a count variable, that counts the number of deleted rows using case_when , eg您可以添加一个计数变量,使用case_when计算已删除行的数量,例如

library(tidyverse)
    df <- tribble(
      ~"# of observation", ~Product, ~"Price in $", ~Place,
      1, "Pizza", 2, "Supermarket",
      2, "Cleaning Product", 3.5, "Supermarket",
      3, "Chocolate", 1, "Supermarket",
      4, "Fruit", 1, "Little Store",
      5, "Red Meat", 2.5, "Supermarket",
      6, "Cleaning Product", 3.5, "Supermarket",
      7, "Bracelet", 3, "Little Store",
      8, "Trucker Hat", 5, "Gas Station",
      9, "Shirt", 15, "Supermarket",
      10, "Shirt", 20, "Supermarket",
      11, "Chicken Breast", 2.5, "Little Store",
      12, "Chocolate", 1, "Gas Station",
      13, "Cereal", 2, "Gas Station",
      14, "Fruit", 1, "Little Store",
      15, "Cleaning Product", 3.5, "Supermarket",
      16, "Trucker Hat", 4, "Supermarket"
    )
    
    
    
    non.food <- paste(c("Cleaning", "Hat", "Shirt", "Bracelet"), collapse = "|")
    mydf <- df %>%
      mutate(count = case_when(
        str_detect(Product, non.food) ~ 1,
        TRUE ~ 0
      )) %>%
      mutate(sum_deleted = sum(count)) %>% 
      filter(!str_detect(Product, non.food))

To count matching or non-matching elements, you can use要计算匹配或不匹配的元素,您可以使用

num_foods <- nrow(mydf[!str_detect(mydf$Product, non.food),])
num_non_foods <- nrow(mydf[str_detect(mydf$Product, non.food),])

You can see, that num_foods == 8 and num_non_foods == 8 , so your code seems to do what it should.你可以看到, num_foods == 8num_non_foods == 8 ,所以你的代码似乎做了它应该做的。

data数据

mydf <- structure(list(id = 1:16, Product = c("Pizza", "Cleaning Product", 
"Chocolate", "Fruit", "Red Meat", "Cleaning Product", "Bracelet", 
"Trucker Hat", "Shirt", "Shirt", "Chicken Breast", "Chocolate", 
"Cereal", "Fruit", "Cleaning Product", "Trucker Hat"), price = c(2, 
3.5, 1, 1, 2.5, 3.5, 3, 5, 15, 20, 2.5, 1, 2, 1, 3.5, 4), place = c("Supermarket", 
"Supermarket", "Supermarket", "Little Store", "Supermarket", 
"Supermarket", "Little Store", "Gas Station", "Supermarket", 
"Supermarket", "Little Store", "Gas Station", "Gas Station", 
"Little Store", "Supermarket", "Supermarket")), row.names = c(NA, 
-16L), class = "data.frame")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据除以范围的另一个变量值来计算观察数 - How to count number of observations based on another variable values that are divided by ranges 如何返回 %in% 向量的观察结果? - How to return observations which are %in% a vector? 如何使用与向量匹配的值提取数据框中的列? - how to extract the column in a dataframe with the values matching to a vector? 通过匹配两列值删除观察 - Delete the the observations by matching the two column values 按没有唯一值的 ID 和日期计算观察结果 - Count observations by ID and Date without unique values 如何计算逻辑向量中的 TRUE 值 - How to count TRUE values in a logical vector r cran:如何使用&函数生成实际值,而不是向量中的观测值数目? - r cran: How to produce the actual values instead of the number of observations in the vector using & function? 如何基于已知字符向量对数据帧下的序列中的字符进行计数 - How to count character in a sequence under a dataframe based on a vector of known characters 如何通过重复观察获得向量中的单个元素 - how to get single elements in a vector with repeated observations 如何计算具有多个观察/行的参与者数量,这些观察/行具有列的不同行中的值组合? - How to count the number of participants with multiple observations/rows that have a combination of values in different rows of a column?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM