对数据集中的变量进行分组

Question

I have the following dataset:我有以下数据集：

Country/Region  1971    1972    1973    1974    1975    1976    1977    1978    1979    1980    1981    1982    1983    1984    1985    1986    1987    1988    1989    1990    1991    1992    1993    1994    1995    1996    1997    1998    1999    2000    2001    2002    2003    2004    2005    2006    2007    2008    2009    2010    GDP per Capita
Albania 3.9 4.5 3.9 4.2 4.5 4.9 5.2 6.2 7.5 7.6 6.4 6.7 7.3 7.6 7.2 7.2 7.5 7.6 7.2 6.3 4.4 2.8 2.3 2.3 1.9 1.9 1.4 1.7 3.0 3.1 3.3 3.8 4.0 4.3 4.1 4.0 4.0 3.9 3.5 3.8 5,626
Austria 48.7    50.5    54.0    51.3    50.2    54.3    51.8    54.5    57.2    55.7    52.8    51.0    51.1    52.9    54.3    53.2    54.2    52.1    52.5    56.4    60.6    55.7    56.0    56.2    59.4    63.1    62.4    62.9    61.4    61.7    65.9    67.4    72.6    73.7    74.6    72.5    70.0    70.6    63.5    69.3    56,259
Belarus 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 124.5   119.4   98.8    82.9    70.2    61.4    62.7    61.8    59.3    57.6    58.7    57.8    59.2    60.7    63.0    62.1    66.2    64.0    64.5    62.3    65.3    6,575
Belgium 116.8   126.7   132.7   130.6   115.6   124.5   123.5   129.0   132.3   125.7   115.5   109.3   100.6   102.6   101.9   102.6   102.8   104.6   105.9   107.9   113.3   112.3   109.8   115.5   115.2   121.3   118.5   120.9   117.4   118.6   119.1   111.9   119.5   116.5   112.6   109.6   105.6   111.0   100.7   106.4   51,237
Bosnia and Herzegovina  0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 23.7    21.2    15.6    13.1    3.0 3.2 4.1 8.3 10.5    10.2    13.5    13.3    14.0    14.3    15.0    15.6    17.2    18.2    19.9    19.4    19.9    6,140
Bulgaria    62.8    64.8    66.6    67.7    72.2    72.1    74.8    77.9    81.1    83.8    79.9    81.5    80.2    78.3    81.1    82.1    83.1    82.1    81.4    74.8    56.4    54.1    55.1    52.5    53.2    53.8    50.9    48.7    42.8    42.1    44.8    42.0    46.3    45.4    45.9    47.3    50.4    49.0    42.2    43.8    9,811
Croatia 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 21.6    15.7    15.2    15.8    15.0    15.8    15.6    17.3    18.4    18.3    17.7    18.6    19.6    21.0    20.4    20.8    20.8    22.1    21.0    19.8    19.0    15,533
Cyprus  1.8 2.2 2.3 1.8 1.7 2.0 2.1 2.3 2.5 2.6 2.5 2.6 2.7 2.8 2.8 3.1 3.6 3.6 3.8 3.8 4.4 4.7 4.9 5.3 5.2 5.5 5.7 5.8 6.0 6.3 6.2 6.3 7.0 6.9 7.0 7.1 7.3 7.6 7.5 7.2 30,521
Czech Republic  151.0   150.0   147.1   146.3   152.6   157.4   166.9   163.0   172.5   165.8   166.5   169.3   170.5   173.1   173.1   173.1   174.2   170.8   163.5   155.1   140.9   131.4   126.7   120.2   123.7   125.6   124.0   117.6   110.9   121.9   121.4   117.2   120.7   121.8   119.6   120.7   122.0   117.3   110.1   114.5   26,114
Denmark 55.0    57.1    56.0    49.8    52.5    58.1    59.7    59.2    62.7    62.5    52.5    54.6    51.3    52.9    60.5    61.1    59.3    55.5    49.8    50.4    60.5    54.8    57.1    61.0    58.0    71.2    61.6    57.7    54.6    50.6    52.2    51.9    57.1    51.6    48.3    56.0    51.4    48.4    46.7    47.0    66,196
Estonia 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 36.1    32.1    23.5    18.0    17.8    16.1    17.0    16.5    16.0    14.9    14.6    15.1    14.6    16.6    16.7    16.9    15.5    19.3    17.7    14.7    18.5    25,260
Finland 39.8    43.7    48.0    44.5    44.4    50.5    50.2    54.7    54.4    55.2    46.0    44.5    43.2    44.4    48.6    49.5    53.8    53.1    52.9    54.4    55.9    53.7    54.8    61.4    56.0    62.2    60.1    56.8    56.1    55.1    60.3    63.0    70.8    67.2    55.2    66.8    65.0    57.0    55.0    62.9    54,869
France  431.9   448.6   484.8   464.6   430.6   469.3   455.3   474.7   481.8   461.4   414.1   396.7   381.0   369.5   360.3   347.8   342.3   340.5   355.9   352.3   379.6   368.0   348.9   344.4   353.8   368.6   361.7   385.3   377.7   376.9   383.8   375.9   385.2   385.4   388.4   379.6   373.1   370.2   351.4   357.8   46,493
Germany 978.6   1003.2  1053.1  1028.5  975.5   1032.2  1017.2  1055.9  1103.6  1055.6  1022.3  982.3   983.9   1006.1  1014.6  1016.3  1007.2  1001.2  976.8   949.7   924.8   886.5   879.9   868.5   867.8   896.5   865.8   858.9   826.9   825.0   843.3   830.7   839.8   840.8   809.0   820.9   796.3   800.1   747.1   761.6   53,276
Greece  25.2    29.2    34.1    32.6    34.5    39.1    40.4    42.8    45.1    45.3    44.9    46.3    49.3    51.0

(Sorry for the horrible formatting). （对不起，可怕的格式）。

There are 41 countries and the years go from 1971-2010.有 41 个国家，时间从 1971 年到 2010 年。 The data for the years is CO2 emissions per capita.年份的数据是人均二氧化碳排放量。
However, due to the nature of the csv, I had to delete the first 2 rows of the dataset.但是，由于 csv 的性质，我不得不删除数据集的前 2 行。 I am not allowed to modify the csv, only manipulate the output in R.我不允许修改 csv，只能操作 R 中的输出。

I want to group the years together under a variable called "CO2 emissions per capita" so that it can be used in graphs, but still have individual columns for the years.我想在一个名为“人均二氧化碳排放量”的变量下将年份组合在一起，以便可以在图表中使用它，但仍然有单独的年份列。 I have managed to create the format using this code:我已设法使用此代码创建格式：

knitr::kable(europe.GDP) %>%
  kable_styling(bootstrap_options = c("striped", "condensed", "interactive", "bordered", "responsive"), 
                full_width = TRUE, font_size = 12, fixed_thead = TRUE) %>%
  add_header_above(c("", "CO2 Emissions per country" = 41), 
                   font_size = 14) %>% 
  column_spec(1, bold = TRUE) %>% 
  row_spec(row = 0, font_size = 14, bold = TRUE) %>%
  scroll_box(width = "100%", height = "800px")

but don't know how to make CO2 emissions a variable as opposed to every year being its own variable.但不知道如何使二氧化碳排放量成为一个变量，而不是每年都是它自己的变量。 I am very new to r, so I'm sorry if I'm not explaining what I'm trying to do very well.我对 r 很陌生，所以如果我没有解释我正在努力做的事情，我很抱歉。

Answer 1

I understand you are very new to R - perhaps I can help you out with a few ideas.我知道你对 R 很陌生——也许我可以帮你解决一些想法。

The table you created using kable may provide what you need in how the table looks.您使用kable创建的表格可能会提供您需要的表格外观。 However, when plotting data, you will find it much easier and more flexible to have in a long format instead of wide .但是，在绘制数据时，您会发现使用long 格式而不是 Wide 格式更容易、更灵活。

Here's an example of how you can approach this.下面是一个如何解决这个问题的例子。 This requires the following libraries:这需要以下库：

library(knitr)
library(kableExtra)
library(tidyverse)
library(ggplot2)

This is a simple data frame created for the example.这是为示例创建的简单数据框。 Note you may need to do further manipulation depending on the structure of your data frame created from the csv file.请注意，您可能需要根据从 csv 文件创建的数据框的结构进行进一步操作。 If you use dput as @akrun suggested, it will help further.如果您按照@akrun 的建议使用dput ，它将进一步提供帮助。

df <- data.frame(
  Country = c("Albania", "Austria", "Belgium", "Bulgaria"),
  Emit_1971 = c(3.9, 48.7, 116.8, 62.8),
  Emit_1972 = c(4.5, 50.5, 126.7, 64.8),
  Emit_1973 = c(3.9, 54, 132.7, 66.6),
  Emit_1974 = c(4.2, 51.3, 130.6, 67.7)
)

So far, this can be used to provide a data table with kable as you currently have.到目前为止，这可用于提供您目前拥有的带有kable的数据表。 Note you can define your column labels with col.names (reduced number of headers since did not provide as many years of data in add_header_above ).请注意，您可以使用col.names定义列标签（减少了标题数量，因为在add_header_above没有提供那么多年的数据）。

knitr::kable(df, col.names = c("Country", seq(1971, 1974, 1))) %>%
  kable_styling(bootstrap_options = c("striped", "condensed", "interactive", "bordered", "responsive"), 
                full_width = TRUE, font_size = 12, fixed_thead = TRUE) %>%
  add_header_above(c("", "CO2 Emissions per country" = 4), 
                   font_size = 14) %>% 
  column_spec(1, bold = TRUE) %>% 
  row_spec(row = 0, font_size = 14, bold = TRUE) %>%
  scroll_box(width = "100%", height = "800px")

As suggested by @Gregor, you can convert your data from wide to long before plotting.正如@Gregor 所建议的，您可以在绘图之前将数据从宽转换为长。 I prefer to use tidyr in tidyverse for this.我更喜欢使用tidyr在tidyverse这一点。 This assumes your column names have underscore and year (other options are also available).这假设您的列名称有下划线和年份（其他选项也可用）。

long.df <- pivot_longer(df, cols = -Country, names_to = c(".value", "Year"), names_sep = "_", names_ptypes = list(Year = numeric())) 

# A tibble: 16 x 3
   Country   Year  Emit
   <fct>    <dbl> <dbl>
 1 Albania   1971   3.9
 2 Albania   1972   4.5
 3 Albania   1973   3.9
 4 Albania   1974   4.2
 5 Austria   1971  48.7
 6 Austria   1972  50.5
 7 Austria   1973  54  
 8 Austria   1974  51.3
 9 Belgium   1971 117. 
10 Belgium   1972 127. 
11 Belgium   1973 133. 
12 Belgium   1974 131. 
13 Bulgaria  1971  62.8
14 Bulgaria  1972  64.8
15 Bulgaria  1973  66.6
16 Bulgaria  1974  67.7

From this, you have options for further manipulation depending on plotting needs.由此，您可以根据绘图需要进行进一步操作。 For example, to plot countries emissions by year, you could do the following:例如，要按年份绘制国家/地区排放量，您可以执行以下操作：

ggplot(long.df, aes(x = Year, y = Emit, col = Country)) +
  geom_line() +
  scale_x_continuous(breaks = seq(1971, 1974, 1)) +
  labs(title = "CO2 Emissions per country", x = "Year", y = "Emissions")

If you want to group countries by year (sum all country emissions in each year), you could do the following:如果您想按年份对国家/地区进行分组（每年所有国家/地区的排放量总和），您可以执行以下操作：

long.df.years <- long.df %>%
  group_by(Year) %>%
  summarise(Total = sum(Emit))

ggplot(long.df.years, aes(x = Year, y = Total)) +
  geom_line() +
  scale_x_continuous(breaks = seq(1971, 1974, 1)) +
  labs(title = "CO2 Emissions", x = "Year", y = "Emissions")

If you wanted to sum up the emissions across all years for each country, you could do the following:如果您想总结每个国家所有年份的排放量，您可以执行以下操作：

long.df.europe <- long.df %>%
  group_by(Country) %>%
  summarise(Total = sum(Emit))

# A tibble: 4 x 2
  Country  Total
  <fct>    <dbl>
1 Albania   16.5
2 Austria  204. 
3 Belgium  507. 
4 Bulgaria 262.

Again, hope this is helpful.再次，希望这是有帮助的。 Please let me know if this is what you had in mind or what might require further clarification.请让我知道这是否是您的想法或可能需要进一步澄清的内容。

对数据集中的变量进行分组

问题描述

1 个解决方案

解决方案1
1 2020-01-19 18:44:06

对数据集中的变量进行分组

问题描述

1 个解决方案

解决方案1 1 2020-01-19 18:44:06

解决方案1
1 2020-01-19 18:44:06