[英]Grouping Variables within a dataset
I have the following dataset:我有以下数据集:
Country/Region 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 GDP per Capita
Albania 3.9 4.5 3.9 4.2 4.5 4.9 5.2 6.2 7.5 7.6 6.4 6.7 7.3 7.6 7.2 7.2 7.5 7.6 7.2 6.3 4.4 2.8 2.3 2.3 1.9 1.9 1.4 1.7 3.0 3.1 3.3 3.8 4.0 4.3 4.1 4.0 4.0 3.9 3.5 3.8 5,626
Austria 48.7 50.5 54.0 51.3 50.2 54.3 51.8 54.5 57.2 55.7 52.8 51.0 51.1 52.9 54.3 53.2 54.2 52.1 52.5 56.4 60.6 55.7 56.0 56.2 59.4 63.1 62.4 62.9 61.4 61.7 65.9 67.4 72.6 73.7 74.6 72.5 70.0 70.6 63.5 69.3 56,259
Belarus 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 124.5 119.4 98.8 82.9 70.2 61.4 62.7 61.8 59.3 57.6 58.7 57.8 59.2 60.7 63.0 62.1 66.2 64.0 64.5 62.3 65.3 6,575
Belgium 116.8 126.7 132.7 130.6 115.6 124.5 123.5 129.0 132.3 125.7 115.5 109.3 100.6 102.6 101.9 102.6 102.8 104.6 105.9 107.9 113.3 112.3 109.8 115.5 115.2 121.3 118.5 120.9 117.4 118.6 119.1 111.9 119.5 116.5 112.6 109.6 105.6 111.0 100.7 106.4 51,237
Bosnia and Herzegovina 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 23.7 21.2 15.6 13.1 3.0 3.2 4.1 8.3 10.5 10.2 13.5 13.3 14.0 14.3 15.0 15.6 17.2 18.2 19.9 19.4 19.9 6,140
Bulgaria 62.8 64.8 66.6 67.7 72.2 72.1 74.8 77.9 81.1 83.8 79.9 81.5 80.2 78.3 81.1 82.1 83.1 82.1 81.4 74.8 56.4 54.1 55.1 52.5 53.2 53.8 50.9 48.7 42.8 42.1 44.8 42.0 46.3 45.4 45.9 47.3 50.4 49.0 42.2 43.8 9,811
Croatia 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 21.6 15.7 15.2 15.8 15.0 15.8 15.6 17.3 18.4 18.3 17.7 18.6 19.6 21.0 20.4 20.8 20.8 22.1 21.0 19.8 19.0 15,533
Cyprus 1.8 2.2 2.3 1.8 1.7 2.0 2.1 2.3 2.5 2.6 2.5 2.6 2.7 2.8 2.8 3.1 3.6 3.6 3.8 3.8 4.4 4.7 4.9 5.3 5.2 5.5 5.7 5.8 6.0 6.3 6.2 6.3 7.0 6.9 7.0 7.1 7.3 7.6 7.5 7.2 30,521
Czech Republic 151.0 150.0 147.1 146.3 152.6 157.4 166.9 163.0 172.5 165.8 166.5 169.3 170.5 173.1 173.1 173.1 174.2 170.8 163.5 155.1 140.9 131.4 126.7 120.2 123.7 125.6 124.0 117.6 110.9 121.9 121.4 117.2 120.7 121.8 119.6 120.7 122.0 117.3 110.1 114.5 26,114
Denmark 55.0 57.1 56.0 49.8 52.5 58.1 59.7 59.2 62.7 62.5 52.5 54.6 51.3 52.9 60.5 61.1 59.3 55.5 49.8 50.4 60.5 54.8 57.1 61.0 58.0 71.2 61.6 57.7 54.6 50.6 52.2 51.9 57.1 51.6 48.3 56.0 51.4 48.4 46.7 47.0 66,196
Estonia 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 36.1 32.1 23.5 18.0 17.8 16.1 17.0 16.5 16.0 14.9 14.6 15.1 14.6 16.6 16.7 16.9 15.5 19.3 17.7 14.7 18.5 25,260
Finland 39.8 43.7 48.0 44.5 44.4 50.5 50.2 54.7 54.4 55.2 46.0 44.5 43.2 44.4 48.6 49.5 53.8 53.1 52.9 54.4 55.9 53.7 54.8 61.4 56.0 62.2 60.1 56.8 56.1 55.1 60.3 63.0 70.8 67.2 55.2 66.8 65.0 57.0 55.0 62.9 54,869
France 431.9 448.6 484.8 464.6 430.6 469.3 455.3 474.7 481.8 461.4 414.1 396.7 381.0 369.5 360.3 347.8 342.3 340.5 355.9 352.3 379.6 368.0 348.9 344.4 353.8 368.6 361.7 385.3 377.7 376.9 383.8 375.9 385.2 385.4 388.4 379.6 373.1 370.2 351.4 357.8 46,493
Germany 978.6 1003.2 1053.1 1028.5 975.5 1032.2 1017.2 1055.9 1103.6 1055.6 1022.3 982.3 983.9 1006.1 1014.6 1016.3 1007.2 1001.2 976.8 949.7 924.8 886.5 879.9 868.5 867.8 896.5 865.8 858.9 826.9 825.0 843.3 830.7 839.8 840.8 809.0 820.9 796.3 800.1 747.1 761.6 53,276
Greece 25.2 29.2 34.1 32.6 34.5 39.1 40.4 42.8 45.1 45.3 44.9 46.3 49.3 51.0
(Sorry for the horrible formatting). (对不起,可怕的格式)。
There are 41 countries and the years go from 1971-2010.有 41 个国家,时间从 1971 年到 2010 年。 The data for the years is CO2 emissions per capita.
年份的数据是人均二氧化碳排放量。
However, due to the nature of the csv, I had to delete the first 2 rows of the dataset.但是,由于 csv 的性质,我不得不删除数据集的前 2 行。 I am not allowed to modify the csv, only manipulate the output in R.
我不允许修改 csv,只能操作 R 中的输出。
I want to group the years together under a variable called "CO2 emissions per capita" so that it can be used in graphs, but still have individual columns for the years.我想在一个名为“人均二氧化碳排放量”的变量下将年份组合在一起,以便可以在图表中使用它,但仍然有单独的年份列。 I have managed to create the format using this code:
我已设法使用此代码创建格式:
knitr::kable(europe.GDP) %>%
kable_styling(bootstrap_options = c("striped", "condensed", "interactive", "bordered", "responsive"),
full_width = TRUE, font_size = 12, fixed_thead = TRUE) %>%
add_header_above(c("", "CO2 Emissions per country" = 41),
font_size = 14) %>%
column_spec(1, bold = TRUE) %>%
row_spec(row = 0, font_size = 14, bold = TRUE) %>%
scroll_box(width = "100%", height = "800px")
but don't know how to make CO2 emissions a variable as opposed to every year being its own variable.但不知道如何使二氧化碳排放量成为一个变量,而不是每年都是它自己的变量。 I am very new to r, so I'm sorry if I'm not explaining what I'm trying to do very well.
我对 r 很陌生,所以如果我没有解释我正在努力做的事情,我很抱歉。
I understand you are very new to R - perhaps I can help you out with a few ideas.我知道你对 R 很陌生——也许我可以帮你解决一些想法。
The table you created using kable
may provide what you need in how the table looks.您使用
kable
创建的表格可能会提供您需要的表格外观。 However, when plotting data, you will find it much easier and more flexible to have in a long format instead of wide .但是,在绘制数据时,您会发现使用long 格式而不是 Wide 格式更容易、更灵活。
Here's an example of how you can approach this.下面是一个如何解决这个问题的例子。 This requires the following libraries:
这需要以下库:
library(knitr)
library(kableExtra)
library(tidyverse)
library(ggplot2)
This is a simple data frame created for the example.这是为示例创建的简单数据框。 Note you may need to do further manipulation depending on the structure of your data frame created from the csv file.
请注意,您可能需要根据从 csv 文件创建的数据框的结构进行进一步操作。 If you use
dput
as @akrun suggested, it will help further.如果您按照@akrun 的建议使用
dput
,它将进一步提供帮助。
df <- data.frame(
Country = c("Albania", "Austria", "Belgium", "Bulgaria"),
Emit_1971 = c(3.9, 48.7, 116.8, 62.8),
Emit_1972 = c(4.5, 50.5, 126.7, 64.8),
Emit_1973 = c(3.9, 54, 132.7, 66.6),
Emit_1974 = c(4.2, 51.3, 130.6, 67.7)
)
So far, this can be used to provide a data table with kable
as you currently have.到目前为止,这可用于提供您目前拥有的带有
kable
的数据表。 Note you can define your column labels with col.names
(reduced number of headers since did not provide as many years of data in add_header_above
).请注意,您可以使用
col.names
定义列标签(减少了标题数量,因为在add_header_above
没有提供那么多年的数据)。
knitr::kable(df, col.names = c("Country", seq(1971, 1974, 1))) %>%
kable_styling(bootstrap_options = c("striped", "condensed", "interactive", "bordered", "responsive"),
full_width = TRUE, font_size = 12, fixed_thead = TRUE) %>%
add_header_above(c("", "CO2 Emissions per country" = 4),
font_size = 14) %>%
column_spec(1, bold = TRUE) %>%
row_spec(row = 0, font_size = 14, bold = TRUE) %>%
scroll_box(width = "100%", height = "800px")
As suggested by @Gregor, you can convert your data from wide to long before plotting.正如@Gregor 所建议的,您可以在绘图之前将数据从宽转换为长。 I prefer to use
tidyr
in tidyverse
for this.我更喜欢使用
tidyr
在tidyverse
这一点。 This assumes your column names have underscore and year (other options are also available).这假设您的列名称有下划线和年份(其他选项也可用)。
long.df <- pivot_longer(df, cols = -Country, names_to = c(".value", "Year"), names_sep = "_", names_ptypes = list(Year = numeric()))
# A tibble: 16 x 3
Country Year Emit
<fct> <dbl> <dbl>
1 Albania 1971 3.9
2 Albania 1972 4.5
3 Albania 1973 3.9
4 Albania 1974 4.2
5 Austria 1971 48.7
6 Austria 1972 50.5
7 Austria 1973 54
8 Austria 1974 51.3
9 Belgium 1971 117.
10 Belgium 1972 127.
11 Belgium 1973 133.
12 Belgium 1974 131.
13 Bulgaria 1971 62.8
14 Bulgaria 1972 64.8
15 Bulgaria 1973 66.6
16 Bulgaria 1974 67.7
From this, you have options for further manipulation depending on plotting needs.由此,您可以根据绘图需要进行进一步操作。 For example, to plot countries emissions by year, you could do the following:
例如,要按年份绘制国家/地区排放量,您可以执行以下操作:
ggplot(long.df, aes(x = Year, y = Emit, col = Country)) +
geom_line() +
scale_x_continuous(breaks = seq(1971, 1974, 1)) +
labs(title = "CO2 Emissions per country", x = "Year", y = "Emissions")
If you want to group countries by year (sum all country emissions in each year), you could do the following:如果您想按年份对国家/地区进行分组(每年所有国家/地区的排放量总和),您可以执行以下操作:
long.df.years <- long.df %>%
group_by(Year) %>%
summarise(Total = sum(Emit))
ggplot(long.df.years, aes(x = Year, y = Total)) +
geom_line() +
scale_x_continuous(breaks = seq(1971, 1974, 1)) +
labs(title = "CO2 Emissions", x = "Year", y = "Emissions")
If you wanted to sum up the emissions across all years for each country, you could do the following:如果您想总结每个国家所有年份的排放量,您可以执行以下操作:
long.df.europe <- long.df %>%
group_by(Country) %>%
summarise(Total = sum(Emit))
# A tibble: 4 x 2
Country Total
<fct> <dbl>
1 Albania 16.5
2 Austria 204.
3 Belgium 507.
4 Bulgaria 262.
Again, hope this is helpful.再次,希望这是有帮助的。 Please let me know if this is what you had in mind or what might require further clarification.
请让我知道这是否是您的想法或可能需要进一步澄清的内容。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.