简体   繁体   English

对数据集中的变量进行分组

[英]Grouping Variables within a dataset

I have the following dataset:我有以下数据集:

Country/Region  1971    1972    1973    1974    1975    1976    1977    1978    1979    1980    1981    1982    1983    1984    1985    1986    1987    1988    1989    1990    1991    1992    1993    1994    1995    1996    1997    1998    1999    2000    2001    2002    2003    2004    2005    2006    2007    2008    2009    2010    GDP per Capita
Albania 3.9 4.5 3.9 4.2 4.5 4.9 5.2 6.2 7.5 7.6 6.4 6.7 7.3 7.6 7.2 7.2 7.5 7.6 7.2 6.3 4.4 2.8 2.3 2.3 1.9 1.9 1.4 1.7 3.0 3.1 3.3 3.8 4.0 4.3 4.1 4.0 4.0 3.9 3.5 3.8 5,626
Austria 48.7    50.5    54.0    51.3    50.2    54.3    51.8    54.5    57.2    55.7    52.8    51.0    51.1    52.9    54.3    53.2    54.2    52.1    52.5    56.4    60.6    55.7    56.0    56.2    59.4    63.1    62.4    62.9    61.4    61.7    65.9    67.4    72.6    73.7    74.6    72.5    70.0    70.6    63.5    69.3    56,259
Belarus 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 124.5   119.4   98.8    82.9    70.2    61.4    62.7    61.8    59.3    57.6    58.7    57.8    59.2    60.7    63.0    62.1    66.2    64.0    64.5    62.3    65.3    6,575
Belgium 116.8   126.7   132.7   130.6   115.6   124.5   123.5   129.0   132.3   125.7   115.5   109.3   100.6   102.6   101.9   102.6   102.8   104.6   105.9   107.9   113.3   112.3   109.8   115.5   115.2   121.3   118.5   120.9   117.4   118.6   119.1   111.9   119.5   116.5   112.6   109.6   105.6   111.0   100.7   106.4   51,237
Bosnia and Herzegovina  0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 23.7    21.2    15.6    13.1    3.0 3.2 4.1 8.3 10.5    10.2    13.5    13.3    14.0    14.3    15.0    15.6    17.2    18.2    19.9    19.4    19.9    6,140
Bulgaria    62.8    64.8    66.6    67.7    72.2    72.1    74.8    77.9    81.1    83.8    79.9    81.5    80.2    78.3    81.1    82.1    83.1    82.1    81.4    74.8    56.4    54.1    55.1    52.5    53.2    53.8    50.9    48.7    42.8    42.1    44.8    42.0    46.3    45.4    45.9    47.3    50.4    49.0    42.2    43.8    9,811
Croatia 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 21.6    15.7    15.2    15.8    15.0    15.8    15.6    17.3    18.4    18.3    17.7    18.6    19.6    21.0    20.4    20.8    20.8    22.1    21.0    19.8    19.0    15,533
Cyprus  1.8 2.2 2.3 1.8 1.7 2.0 2.1 2.3 2.5 2.6 2.5 2.6 2.7 2.8 2.8 3.1 3.6 3.6 3.8 3.8 4.4 4.7 4.9 5.3 5.2 5.5 5.7 5.8 6.0 6.3 6.2 6.3 7.0 6.9 7.0 7.1 7.3 7.6 7.5 7.2 30,521
Czech Republic  151.0   150.0   147.1   146.3   152.6   157.4   166.9   163.0   172.5   165.8   166.5   169.3   170.5   173.1   173.1   173.1   174.2   170.8   163.5   155.1   140.9   131.4   126.7   120.2   123.7   125.6   124.0   117.6   110.9   121.9   121.4   117.2   120.7   121.8   119.6   120.7   122.0   117.3   110.1   114.5   26,114
Denmark 55.0    57.1    56.0    49.8    52.5    58.1    59.7    59.2    62.7    62.5    52.5    54.6    51.3    52.9    60.5    61.1    59.3    55.5    49.8    50.4    60.5    54.8    57.1    61.0    58.0    71.2    61.6    57.7    54.6    50.6    52.2    51.9    57.1    51.6    48.3    56.0    51.4    48.4    46.7    47.0    66,196
Estonia 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 36.1    32.1    23.5    18.0    17.8    16.1    17.0    16.5    16.0    14.9    14.6    15.1    14.6    16.6    16.7    16.9    15.5    19.3    17.7    14.7    18.5    25,260
Finland 39.8    43.7    48.0    44.5    44.4    50.5    50.2    54.7    54.4    55.2    46.0    44.5    43.2    44.4    48.6    49.5    53.8    53.1    52.9    54.4    55.9    53.7    54.8    61.4    56.0    62.2    60.1    56.8    56.1    55.1    60.3    63.0    70.8    67.2    55.2    66.8    65.0    57.0    55.0    62.9    54,869
France  431.9   448.6   484.8   464.6   430.6   469.3   455.3   474.7   481.8   461.4   414.1   396.7   381.0   369.5   360.3   347.8   342.3   340.5   355.9   352.3   379.6   368.0   348.9   344.4   353.8   368.6   361.7   385.3   377.7   376.9   383.8   375.9   385.2   385.4   388.4   379.6   373.1   370.2   351.4   357.8   46,493
Germany 978.6   1003.2  1053.1  1028.5  975.5   1032.2  1017.2  1055.9  1103.6  1055.6  1022.3  982.3   983.9   1006.1  1014.6  1016.3  1007.2  1001.2  976.8   949.7   924.8   886.5   879.9   868.5   867.8   896.5   865.8   858.9   826.9   825.0   843.3   830.7   839.8   840.8   809.0   820.9   796.3   800.1   747.1   761.6   53,276
Greece  25.2    29.2    34.1    32.6    34.5    39.1    40.4    42.8    45.1    45.3    44.9    46.3    49.3    51.0

(Sorry for the horrible formatting). (对不起,可怕的格式)。

There are 41 countries and the years go from 1971-2010.有 41 个国家,时间从 1971 年到 2010 年。 The data for the years is CO2 emissions per capita.年份的数据是人均二氧化碳排放量。
However, due to the nature of the csv, I had to delete the first 2 rows of the dataset.但是,由于 csv 的性质,我不得不删除数据集的前 2 行。 I am not allowed to modify the csv, only manipulate the output in R.我不允许修改 csv,只能操作 R 中的输出。

I want to group the years together under a variable called "CO2 emissions per capita" so that it can be used in graphs, but still have individual columns for the years.我想在一个名为“人均二氧化碳排放量”的变量下将年份组合在一起,以便可以在图表中使用它,但仍然有单独的年份列。 I have managed to create the format using this code:我已设法使用此代码创建格式:

knitr::kable(europe.GDP) %>%
  kable_styling(bootstrap_options = c("striped", "condensed", "interactive", "bordered", "responsive"), 
                full_width = TRUE, font_size = 12, fixed_thead = TRUE) %>%
  add_header_above(c("", "CO2 Emissions per country" = 41), 
                   font_size = 14) %>% 
  column_spec(1, bold = TRUE) %>% 
  row_spec(row = 0, font_size = 14, bold = TRUE) %>%
  scroll_box(width = "100%", height = "800px")

but don't know how to make CO2 emissions a variable as opposed to every year being its own variable.但不知道如何使二氧化碳排放量成为一个变量,而不是每年都是它自己的变量。 I am very new to r, so I'm sorry if I'm not explaining what I'm trying to do very well.我对 r 很陌生,所以如果我没有解释我正在努力做的事情,我很抱歉。

I understand you are very new to R - perhaps I can help you out with a few ideas.我知道你对 R 很陌生——也许我可以帮你解决一些想法。

The table you created using kable may provide what you need in how the table looks.您使用kable创建的表格可能会提供您需要的表格外观。 However, when plotting data, you will find it much easier and more flexible to have in a long format instead of wide .但是,在绘制数据时,您会发现使用long 格式而不是 Wide 格式更容易、更灵活。

Here's an example of how you can approach this.下面是一个如何解决这个问题的例子。 This requires the following libraries:这需要以下库:

library(knitr)
library(kableExtra)
library(tidyverse)
library(ggplot2)

This is a simple data frame created for the example.这是为示例创建的简单数据框。 Note you may need to do further manipulation depending on the structure of your data frame created from the csv file.请注意,您可能需要根据从 csv 文件创建的数据框的结构进行进一步操作。 If you use dput as @akrun suggested, it will help further.如果您按照@akrun 的建议使用dput ,它将进一步提供帮助。

df <- data.frame(
  Country = c("Albania", "Austria", "Belgium", "Bulgaria"),
  Emit_1971 = c(3.9, 48.7, 116.8, 62.8),
  Emit_1972 = c(4.5, 50.5, 126.7, 64.8),
  Emit_1973 = c(3.9, 54, 132.7, 66.6),
  Emit_1974 = c(4.2, 51.3, 130.6, 67.7)
)

So far, this can be used to provide a data table with kable as you currently have.到目前为止,这可用于提供您目前拥有的带有kable的数据表。 Note you can define your column labels with col.names (reduced number of headers since did not provide as many years of data in add_header_above ).请注意,您可以使用col.names定义列标签(减少了标题数量,因为在add_header_above没有提供那么多年的数据)。

knitr::kable(df, col.names = c("Country", seq(1971, 1974, 1))) %>%
  kable_styling(bootstrap_options = c("striped", "condensed", "interactive", "bordered", "responsive"), 
                full_width = TRUE, font_size = 12, fixed_thead = TRUE) %>%
  add_header_above(c("", "CO2 Emissions per country" = 4), 
                   font_size = 14) %>% 
  column_spec(1, bold = TRUE) %>% 
  row_spec(row = 0, font_size = 14, bold = TRUE) %>%
  scroll_box(width = "100%", height = "800px")

各国二氧化碳排放量表

As suggested by @Gregor, you can convert your data from wide to long before plotting.正如@Gregor 所建议的,您可以在绘图之前将数据从宽转换为长。 I prefer to use tidyr in tidyverse for this.我更喜欢使用tidyrtidyverse这一点。 This assumes your column names have underscore and year (other options are also available).这假设您的列名称有下划线和年份(其他选项也可用)。

long.df <- pivot_longer(df, cols = -Country, names_to = c(".value", "Year"), names_sep = "_", names_ptypes = list(Year = numeric())) 

# A tibble: 16 x 3
   Country   Year  Emit
   <fct>    <dbl> <dbl>
 1 Albania   1971   3.9
 2 Albania   1972   4.5
 3 Albania   1973   3.9
 4 Albania   1974   4.2
 5 Austria   1971  48.7
 6 Austria   1972  50.5
 7 Austria   1973  54  
 8 Austria   1974  51.3
 9 Belgium   1971 117. 
10 Belgium   1972 127. 
11 Belgium   1973 133. 
12 Belgium   1974 131. 
13 Bulgaria  1971  62.8
14 Bulgaria  1972  64.8
15 Bulgaria  1973  66.6
16 Bulgaria  1974  67.7

From this, you have options for further manipulation depending on plotting needs.由此,您可以根据绘图需要进行进一步操作。 For example, to plot countries emissions by year, you could do the following:例如,要按年份绘制国家/地区排放量,您可以执行以下操作:

ggplot(long.df, aes(x = Year, y = Emit, col = Country)) +
  geom_line() +
  scale_x_continuous(breaks = seq(1971, 1974, 1)) +
  labs(title = "CO2 Emissions per country", x = "Year", y = "Emissions")

按年绘制国家排放量

If you want to group countries by year (sum all country emissions in each year), you could do the following:如果您想按年份对国家/地区进行分组(每年所有国家/地区的排放量总和),您可以执行以下操作:

long.df.years <- long.df %>%
  group_by(Year) %>%
  summarise(Total = sum(Emit))

ggplot(long.df.years, aes(x = Year, y = Total)) +
  geom_line() +
  scale_x_continuous(breaks = seq(1971, 1974, 1)) +
  labs(title = "CO2 Emissions", x = "Year", y = "Emissions")

每年总排放量图

If you wanted to sum up the emissions across all years for each country, you could do the following:如果您想总结每个国家所有年份的排放量,您可以执行以下操作:

long.df.europe <- long.df %>%
  group_by(Country) %>%
  summarise(Total = sum(Emit))

# A tibble: 4 x 2
  Country  Total
  <fct>    <dbl>
1 Albania   16.5
2 Austria  204. 
3 Belgium  507. 
4 Bulgaria 262.

Again, hope this is helpful.再次,希望这是有帮助的。 Please let me know if this is what you had in mind or what might require further clarification.请让我知道这是否是您的想法或可能需要进一步澄清的内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何对R中具有多个分组变量的整洁数据集中的变量长度进行归一化/重采样/内插数据? - How to transform/resample/interpolate data for normalising variable length within a tidy dataset with multiple grouping variables in R? 数据集中哪些变量在 id 中是常量 - Which variables in dataset are constant within id 如何聚合两个分组变量内的数据(组中组)? - How to aggregate data within two grouping variables (group in group)? 使用R中的函数中的两个分组变量聚合变量 - Aggregate a variable using two grouping variables within a function in R 根据数据帧内的分组变量添加缺失数据 - Adding missing data conditional on grouping variables within data frame 通过对变量进行分组和分组来找到平均值,并计算R中这些组中某个值出现的次数 - Find the average values by grouping and sub-grouping variables, and count of the number of times a value occurs within these groups in R 按数据集中缺少的因素分组 - Grouping by factor absent in dataset 如何将一个观测值的变量附加到R中同一数据集中的另一个观测值 - How to attach variables of one observation to another within the same dataset in R 将此数据集分组的R代码是什么 - What is the R code for the grouping this dataset 在 R 中对大型数据集的行进行分组 - Grouping rows of large dataset in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM