[英]Subsetting a dataframe based on values in another dataframe
sorry an absolute beginner so have some very basic questions! 对不起绝对的初学者所以有一些非常基本的问题!
I have a very large data set that lists individual transactions by a household. 我有一个非常大的数据集,列出了一个家庭的个人交易。 Example is below. 示例如下。
# hh_id trans_type transaction_value
# 1 hh1 food 4
# 2 hh1 water 5
# 3 hh1 transport 4
# 4 hh2 water 3
# 5 hh3 transport 1
# 6 hh3 food 10
# 7 hh4 food 5
# 8 hh4 transport 15
# 9 hh4 water 10
I want to to create a new data frame that has all transactions listed for ONLY the households that have transactions in the "water" category. 我想创建一个新的数据框,其中只列出了具有“水”类别交易的住户的所有交易。 (Eg, I would want a df without hh3 above because they have not had any expenses in "water") (例如,我想要一个没有hh3的df因为他们在“水”中没有任何费用)
as a first step, I have a data frame with one column (hh_ids) that only has the household IDs of the ones that I want. 作为第一步,我有一个数据框,其中一列(hh_ids)只有我想要的家庭ID。 How do I then subset my larger dataframe to remove all rows of transactions that are not from a household that have expenses in the "water" category? 然后,我如何将我的较大数据框子集化,以删除不属于“水”类别费用的所有交易行?
Data 数据
## data from @gung
d <- read.table(text="hh_id trans_type transaction_value
hh1 food 4
hh1 water 5
hh1 transport 4
hh2 water 3
hh3 transport 1
hh3 food 10
hh4 food 5
hh4 transport 15
hh4 water 10", header=T)
d <- read.table(text="hh_id trans_type transaction_value
hh1 food 4
hh1 water 5
hh1 transport 4
hh2 water 3
hh3 transport 1
hh3 food 10
hh4 food 5
hh4 transport 15
hh4 water 10", header=T)
dw <- as.character(with(d, hh_id[trans_type=="water"]))
ds <- d[which(d$hh_id%in%dw),]
ds
# hh_id trans_type transaction_value
# 1 hh1 food 4
# 2 hh1 water 5
# 3 hh1 transport 4
# 4 hh2 water 3
# 7 hh4 food 5
# 8 hh4 transport 15
# 9 hh4 water 10
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.