I would like to draw a Lorenz curve and calculate a Gini index with the objective to determine how much parasites does the top 20% most infected hosts support.
Here is my data set:
Number of parasites per host:
parasites = c(0,1,2,3,4,5,6,7,8,9,10)
Number of hosts associated with each number of parasites given above:
hosts = c(18,20,28,19,16,10,3,1,0,0,0)
To represent the Lorenz curve:
I manually calculated the cumulative percentage of parasites and hosts:
cumul_parasites <- cumsum(parasites)/max(cumsum(parasites))
cumul_hosts <- cumsum(hosts)/max(cumsum(hosts))
plot(cumul_hosts, cumul_parasites, type= "l")
I also tested the function Lc
(package ineq
):
Lc.p <- Lc(parasites,n=hosts)
plot(Lc.p)
Why are the two curves (manual and function Lc
) different ?
The 2 graphs are different because when you calculate the cumulative precentage (the degree) you must multiply it with the frequency.
The right solution would be:
parasites = c(0,1,2,3,4,5,6,7,8,9,10)
hosts = c(18,20,28,19,16,10,3,1,0,0,0)
cumul_parasites <- cumsum(parasites*hosts)/max(cumsum(parasites*hosts))
cumul_hosts <- cumsum(hosts)/max(cumsum(hosts))
plot(cumul_hosts, cumul_parasites, type= "l")
lines(cumul_hosts, cumul_parasites,col = 2, lwd = 2, type = "p")
legend("topleft", c('My calc', 'LC'), col = 1:2, lty = 1, box.col = 1)
and this fits the Lc calculation exactly.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.