[英]R - get sum from one column based on categories in another column
I am new to R and trying to learn on my own. 我是R的新手,正尝试自己学习。 I have data in csv format with 1,048,575 rows and 73 columns.
我有csv格式的数据,具有1,048,575行和73列。 I am looking at three columns - year, country, aid_amount.
我正在查看三列-年,国家/地区,援助金额。 I want to get the sum of aid_amount by country for i) all years, and ii) for years 1991-2010.
我想按国家(i)来获得ii-1991-2010年间的援助总额。 I tried the following to get for all years BUT the result I get is different from when I sort/sum in Excel.
我尝试以下方法来获得所有年份的结果,但是获得的结果与在Excel中排序/求和时的结果不同。 What is wrong here.
怎么了 Also, what change should I make for ii) years 1991-2010.
此外,我应该对ii- 1991-2010年做出什么更改。 Thanks.
谢谢。
aiddata <- read.csv("aiddata_research.csv")
sum_by_country <- tapply(aiddata$aid_amount, aiddata$country, sum, na.rm=TRUE) # There are missing data on aid_amount
write.csv(sum_by_country, "sum_by_country.csv")
I have also tried: 我也尝试过:
sum_by_country <- aggregate(aid_amount ~ country, data = aiddata, sum) instead of tapply.
The first few rows for a few columns look like this: 几列的前几行如下所示:
aiddata_id year country aid_amount
23229017 2004 Bangladesh 685899.2666
14582630 2000 Bilateral, unspecified 15772.77174
28085216 2006 Bilateral, unspecified 38926.82898
28702455 2006 Bilateral, unspecified 12633.85659
29928104 2006 Cambodia 955412.9884
27783934 2006 Cambodia 11773.77268
37418683 2008 Guatemala 40150.7331
94726192 2010 Guatemala 151206.3096
You could use data.table
for the big dataset. 您可以将
data.table
用于大数据集。 If you want to get the sum of aid_amount
for each country
by year
如果您想按
year
获得每个country
的aid_amount
之和
library(data.table)
setkey(setDT(aiddata), country,year)[,
list(aid_amount=sum(aid_amount)), by=list(country, year)]
To get the sum of aid_amount
for each country
获取每个
country
的aid_amount
金额总和
setkey(setDT(aiddata), country)[,
list(aid_amount=sum(aid_amount)), by=list(country)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.