简体   繁体   English

R-根据另一列中的类别从一列获取总和

[英]R - get sum from one column based on categories in another column

I am new to R and trying to learn on my own. 我是R的新手,正尝试自己学习。 I have data in csv format with 1,048,575 rows and 73 columns. 我有csv格式的数据,具有1,048,575行和73列。 I am looking at three columns - year, country, aid_amount. 我正在查看三列-年,国家/地区,援助金额。 I want to get the sum of aid_amount by country for i) all years, and ii) for years 1991-2010. 我想按国家(i)来获得ii-1991-2010年间的援助总额。 I tried the following to get for all years BUT the result I get is different from when I sort/sum in Excel. 我尝试以下方法来获得所有年份的结果,但是获得的结果与在Excel中排序/求和时的结果不同。 What is wrong here. 怎么了 Also, what change should I make for ii) years 1991-2010. 此外,我应该对ii- 1991-2010年做出什么更改。 Thanks. 谢谢。

aiddata <- read.csv("aiddata_research.csv")
sum_by_country <- tapply(aiddata$aid_amount, aiddata$country, sum, na.rm=TRUE) # There are missing data on aid_amount
write.csv(sum_by_country, "sum_by_country.csv")

I have also tried: 我也尝试过:

sum_by_country <- aggregate(aid_amount ~ country, data = aiddata, sum) instead of tapply.

The first few rows for a few columns look like this: 几列的前几行如下所示:

aiddata_id  year    country                  aid_amount
23229017    2004    Bangladesh               685899.2666
14582630    2000    Bilateral, unspecified   15772.77174
28085216    2006    Bilateral, unspecified   38926.82898
28702455    2006    Bilateral, unspecified   12633.85659
29928104    2006    Cambodia                 955412.9884
27783934    2006    Cambodia                 11773.77268
37418683    2008    Guatemala                40150.7331
94726192    2010    Guatemala                151206.3096

You could use data.table for the big dataset. 您可以将data.table用于大数据集。 If you want to get the sum of aid_amount for each country by year 如果您想按year获得每个countryaid_amount之和

library(data.table)
setkey(setDT(aiddata), country,year)[, 
         list(aid_amount=sum(aid_amount)), by=list(country, year)]

To get the sum of aid_amount for each country 获取每个countryaid_amount金额总和

setkey(setDT(aiddata), country)[, 
          list(aid_amount=sum(aid_amount)), by=list(country)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM