简体   繁体   English

R 两个数据集之间的相关性和相关系数

[英]R Correlation and Correlation Coefficient between two datasets

I have two data sets, one displays the schoolenrollment for 6 countries, the other one shows the GDP of each country.我有两个数据集,一个显示6个国家的入学率,另一个显示每个国家的GDP。 I want to calculate the correlation coefficient between the school enrolment and GDP of each country.我想计算每个国家的入学率和GDP之间的相关系数。 I have a look for the question at : How can I create a correlation matrix in R?我在以下位置寻找问题: 如何在 R 中创建相关矩阵?

But I have problem with range of the two datasets (number of rows and columns of the datasets ) …但是我对两个数据集的范围有问题(数据集的行数和列数)......

Schoolenrollemnt dataset: https://drive.google.com/file/d/0B1NJGKqdrgRtTjcySzZOM2xKZU0/edit?usp=sharing Schoolenrollemnt 数据集: https ://drive.google.com/file/d/0B1NJGKqdrgRtTjcySzZOM2xKZU0/edit ? usp = sharing

    CountryName year_2000   year_2004   year_2008   year_2012
    Comoros 201899884   362420484   4880000000  6800000000
    Jordan  8457923945  11407566660 54082389393 58768800833
    UAEmirates  104337375343    147824374543    21902892584 36044457920
    Egypt   99838540997 78845185709 840000000   1240000000
    Qatar   17759889598 31675273812 131611819294    210279947256
    Syria   19325894913 25086930693 88882967742 95981572517

gdp dataset: https://drive.google.com/file/d/0B1NJGKqdrgRtRm9SWm9ObGpwbU0/edit?usp=sharing gdp 数据集: https : //drive.google.com/file/d/0B1NJGKqdrgRtRm9SWm9ObGpwbU0/edit? usp =sharing

Indicator   com_2000    com_2004    com_2008    com_2012    Jor_2000    Jor_2004    Jor_2008    Jor_2012    ARE_2000    ARE_2004    ARE_2008    ARE_2012    Egy_2000    Egy_2004    Egy_2008    Egy_2012    Qat_2000    Qat_2004    Qat_2008    Qat_2012    Syr_2000    Syr_2004    Syr_2008    Syr_2012
preprimary (% gross)    2.39124 4.3563  23.68581    24.80515401 31.08014    32.71263    37.38376    33.81492    63.34796    81.92245    91.926025   71.14425    11.94312    15.1121 23.49822    27.3631 29.23454    32.69621    49.64917    73.42391    8.67231 10.00469    9.93459 10.6214
primary (% gross)   116.7763    121.0558    112.08  117.3767    102.3871    106.8326    102.04  98.87783    94.22761    102.304 107.5285    108.3284    101.3365    105.5968    109.9804    108.6207    104.7228    106.0118    104.0118    102.94  107.6219    121.8342    118.0423    122.2586
secondary (% gross) 31.8468 48.04706    60.04706    73.48619    85.90683    91.6662 93.89221    89.05884    45.0041 57.57103    68.905185   72.91143    85.83446    87.64275    89.48275    76.06258    86.4097 110.453 93.25074    12.14547    43.96275    66.56304    72.69195    74.42249
tertiary (% gross)  1.41838 3.00913 6.474124923 11.42145    28.28053    39.41155    44.30046    39.93893    0   0   0   0   31.62423    30.32905    31.64919    28.7532 22.565405   17.80551    11.3693 12.14547    12.00074    15.0151 24.20384    25.63541

the X-axis has to have the value of years (2000,2004,2008,2012), y-axis has the enrollment type... for each country i want separate graph,,,, "the graph link at the comments" X 轴必须具有年份 (2000,2004,2008,2012) 的值,y 轴具有注册类型...对于我想要单独图表的每个国家/地区,“评论中的图表链接”

the code is not that true,, but this is my start :代码不是那么正确,但这是我的开始:

    library(lattice)
        xtest<-read.csv(file.choose(), header=T, sep=",")
ytest<-read.csv(file.choose(), header=F, sep=",")
xvalues<-as.matrix(xtest)
yvalues<-as.matrix(ytest)
corvalue<-cor(xvalues,yvalues)
image(x=seq(dim(xvalues)[2]), y=seq(dim(yvalues)[2]), z=corvalue, xlab="x column", ylab="y column")
text(expand.grid(x=seq(dim(xvalues)[2]), y=seq(dim(yvalues)[2])), labels=round(c(corvalue),2))

as a test i take a subset of the original dataset of gdp , xtest :作为测试,我采用 gdp 的原始数据集 xtest 的一个子集:

Comoros Comoros Comoros Comoros
201899884   201899884   201899884   201899884
362420484   362420484   362420484   362420484
4880000000  4880000000  4880000000  4880000000
6800000000  6800000000  6800000000  6800000000

and for the scoolenrollment, i take subset of data, ytest :对于 scoolenrollment,我采用了数据子集 ytest :

0   2.39124 4.3563  23.68581    24.80515401
99.78652    116.7763    121.0558    112.08  117.3767
0   31.8468 48.04706    60.04706    73.48619
0.82459 1.41838 3.00913 6.474124923 11.42145

any suggestion for better output ?有什么更好的输出建议吗? the output result in the comments :评论中的输出结果:

i use this code:我使用这个代码:

xtest<-read.csv(file.choose(), header=T, sep=",")
ytest<-read.csv(file.choose(), header=F, sep=",")
xvalues<-as.matrix(xtest)
yvalues<-as.matrix(ytest)
corvalue<-cor(xvalues,yvalues)
image(x=seq(dim(xvalues)[2]), y=seq(dim(yvalues)[2]), z=corvalue, xlab="x column", ylab="y column")
text(expand.grid(x=seq(dim(xvalues)[2]), y=seq(dim(yvalues)[2])), labels=round(c(corvalue),2))

where the used datasets: ytest:其中使用的数据集:ytest:

0   2.39124 4.3563  23.68581    24.80515401
99.78652    116.7763    121.0558    112.08  117.3767
0   31.8468 48.04706    60.04706    73.48619
0.82459 1.41838 3.00913 6.474124923 11.42145

xtest:测试:

Comoros Comoros Comoros Comoros
201899884   201899884   201899884   201899884
362420484   362420484   362420484   362420484
4880000000  4880000000  4880000000  4880000000
6800000000  6800000000  6800000000  6800000000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM