简体   繁体   English

R - 根据另一个数据框列中的值满足的条件在数据框列中添加值(由公式导出)

[英]R - Add values (derived by a formula) in a dataframe column based on a condition met by values in a column of another dataframe

Here is an example dataset:这是一个示例数据集:

data = data.frame('Cat' = c('A', 'A', 'A', 'B', 'B', 'C', 'C', 'C', 'C', 'C'),
                  'Value' = c(1,1,1,2,2,3,3,3,3,3))
data

在此处输入图片说明

Another dataframe:另一个数据框:

a = data.frame('Name' = c('A', 'B', 'C', 'D'))

Desired output:期望的输出:

在此处输入图片说明

I want to understand how to give reference of another cell within the same row of a dataframe, and perform some function using the value of that cell.我想了解如何在数据帧的同一行中引用另一个单元格,并使用该单元格的值执行某些功能。

This worked for "In Data":这个工作对“数据”:

a[,'In Data?'] = ifelse(a$Name %in% unique(data$Cat), "Y", "N")

This failed for median:这对于中位数失败了:

b$Median = median(data$Cat[data$Cat == a$Name])

Error message:
Error in Ops.factor(data$Cat, a$Name) : 
  level sets of factors are different

This failed for count:计数失败

a$Count = ifelse(a$Name %in% unique(data$Cat), length(data$Cat==a$Name), 0)

Error:
Error in Ops.factor(data$Cat, a$Name) : 
  level sets of factors are different

. . . . 2nd Dataframe columns :第二个数据框列:

  1. Cat : ABCD猫:ABCD
  2. count :数数 :
  3. proportion :部分 :
  4. median :中位数:
  5. values > median :值 > 中位数:
  6. f(x) : {count + 10} f(x) : {计数 + 10}
  7. In Data?在数据? :

It's better to frame these operations as merging and summarizing.最好将这些操作定义为合并和汇总。 (Talking in terms of cells and rows seem very Excel-like rather than R-like). (就单元格和行而言,似乎非常像 Excel 而不是像 R 语言)。 The dplyr package helps a lot here dplyr包在这里有很大帮助

library(dplyr)
a %>% 
  left_join(data, by=c("Name"="Cat")) %>% 
  group_by(Name) %>% 
  summarize(
    Count=sum(!is.na(Value)),
    Median=median(Value),
    ValuesGtMed=sum(Value>Median),
    f = Count+10,
    InData = if_else(Count>0, "Y","N")
  ) %>% 
  mutate(Proportion=Count/sum(Count))

The left_join makes sure we get all values in a and then we just use different summary functions per the groups defined by Nameleft_join确保我们得到的所有值a ,然后我们只需每月通过定义的组使用不同的汇总函数Name

Output:输出:

  Name  Count Median ValuesGtMed     f InData Proportion
  <chr> <int>  <dbl>       <int> <dbl> <chr>       <dbl>
1 A         3      1           0    13 Y             0.3
2 B         2      2           0    12 Y             0.2
3 C         5      3           0    15 Y             0.5
4 D         0     NA          NA    10 N             0  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM