[英]How to run a function over certain observations?
I am trying to calculate the number of seats that parties won per district in a given election based on the number of votes they received in the district. 我正在尝试根据政党在该地区所获得的选票数来计算每个政党在给定选举中赢得的席位数。
There is a function in R that will do this for each region: R中有一个函数可以对每个区域执行此操作:
seats_ha(party, votes, seats, method="dhondt")
The first argument provides a vector of party list names, the second argument provides a vector of the vote totals that each party won in a district, seats = the number of seats in a given district and the method is the electoral formula used to translate votes into seats. 第一个参数提供一个政党名单名称的向量,第二个参数提供一个政党在一个地区中赢得的选票总数的向量,位子=给定地区中的席位数,方法是用于转换选票的选举公式进入座位。 What I've been able to do is to calculate this by individually subsetting the data by a single region in an electoral year. 我能够做的是通过在选举年度中按单个区域单独对数据进行分组来计算此结果。 My problem is that I have ~27 regions over 3 electoral years. 我的问题是,在3个选举年中,我有〜27个地区。
So my data look like this: 所以我的数据看起来像这样:
year region dist_seat party_name party_vote reg_id cong_id
2016-2021 AMAZONAS 2 UPP 0 1 3
2016-2021 AMAZONAS 2 FP 51067 1 3
2016-2021 AMAZONAS 2 AP 11992 1 3
2016-2021 ANCASH 5 FE 4534 2 3
2016-2021 ANCASH 5 UPP 0 2 3
I would like to be able to run the function for each region in each year. 我希望能够每年在每个地区运行该功能。
Consider by
, the object-oriented wrapper to tapply
which slices a data frame by one or more columns to run needed operations. 考虑by
,使用面向对象的包装器来tapply
,从而按一列或多列对数据帧进行切片以运行所需的操作。 The input parameter to by's FUN
is always a subsetted data frame and output will always be a list of whatever function returns, here being the return of seat_ha
. by的FUN
的输入参数始终是一个子集数据帧,输出始终是所有函数返回的列表,这里是seat_ha
的返回。
You can even add a new column to subsetted data frame and then do.call
+ rbind
results for a single data frame. 您甚至可以将新列添加到子数据框,然后对单个数据框执行do.call
+ rbind
结果。 Below tryCatch
ensures new column is always populated: actual result of seats_ha
and NA
if it encounters an error. 在tryCatch
下面,确保始终填充新列: seats_ha
和NA
实际结果(如果遇到错误)。
BUILD LIST OF SUBSETTED DFs
df_list <- by(mydata, mydata[,c("year", "region")], FUN=function(sub) {
# ADD NEW COLUMN TO sub DF
sub$calc_seat <- tryCatch(with(sub, seats_ha(party_name, party_vote,
dist_seat, method="dhondt")),
error = function(e) NA)
return(sub)
})
# ROW BIND ALL DFs
final_df <- do.call(rbind, df_list)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.