简体   繁体   English

删除相邻列不等于 100 的匹配观测值

[英]Removing matching observations where their adjacent column does not equal to 100

I have ~4000 observations in my data frame, test_11, and have pasted part of the data frame below:我的数据框 test_11 中有大约 4000 个观察值,并在下面粘贴了部分数据框:

data frame snippit数据框片段

The k_hidp column represents matching households, the k_fihhmnnet1_dv column is their reported household income and the percentage_income_rounded reports each participant's income contribution to the total household income k_hidp 列代表匹配的家庭,k_fihhmnnet1_dv 列是他们报告的家庭收入,percentage_income_rounded 报告每个参与者的收入对家庭总收入的贡献

I want to filter my data to remove all k_hidp observations where their collective income in the percentage_income_rounded does not equal 100.我想过滤我的数据以删除所有 k_hidp 观察值,其中它们在 percent_income_rounded 中的集体收入不等于 100。

So for example, the first household 68632420 reported a contribution of 83% (65+13) instead of the 100% as the other households report.例如,第一个家庭 68632420 报告了 83% (65+13) 的贡献,而不是其他家庭报告的 100%。

Is there any way to remove these household observations so I am only left with households with a collective income of 100%?有什么办法可以消除这些家庭观察结果,所以我只剩下集体收入为 100% 的家庭?

Thank you!谢谢!

Try this:尝试这个:

## Creating the dataframe
df=data.frame(k_hidp = c(68632420,68632420,68632420,68632420,68632420,68632420,68632422,68632422,68632422,68632422,68632428,68632428),
              percentage_income_rounded = c(65,18,86,14,49,51,25,25,25,25,50,50))

## Loading the libraries
library(dplyr)

## Aggregating and determining which household collective income is 100%
df1 = df %>%
  group_by(k_hidp) %>%
  mutate(TotalPercentage = sum(percentage_income_rounded)) %>%
  filter(TotalPercentage == 100)

Output输出

> df1
# A tibble: 6 x 3
# Groups:   k_hidp [2]
    k_hidp percentage_income_rounded TotalPercentage
     <dbl>                     <dbl>           <dbl>
1 68632422                        25             100
2 68632422                        25             100
3 68632422                        25             100
4 68632422                        25             100
5 68632428                        50             100
6 68632428                        50             100

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 进行与相邻列中的观测值相同的NA观测值 - Make NA observations that are identical with observations in adjacent column 通过匹配两列值删除观察 - Delete the the observations by matching the two column values 根据数据集 B 的列中的观察数复制数据集 A,其中两个数据集在 1 列中具有匹配值 - Replicating data set A based on the number of observations in the column of data set B where both data sets have a matching values in 1 column 为什么 R 中的 all.equal 不测试每个观察值的差异,其中通过容差量而不是聚合平均值发生不匹配? - Why does all.equal in R not test for differences per observations where mismatch happen by tolerance amount, rather than the aggregate mean? summary() 默认情况下是否只返回 100 个观察值,我可以增加它吗? - Does summary() by default only return 100 observations and can I increase this? 删除所有观察值具有相同值的列会影响我的模型吗? - Will removing a column having same values for all observations affect my model? t.test 在 R 中针对特定列值(删除所有不等于某个行值的) - t.test in R for a specific column value (removing all that does not equal a certain row value) 循环清理表,其中观察值存储为列 - Loop to clean up table where observations are stored as column 汇总列并识别值不等于 100 的列 - Summarise columns and identify columns where value is not equal to 100 R:颜色前 100 次观察不同于后 100 次观察 - R: color first 100 observations different than second 100 observations
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM