简体   繁体   English

如何创建一个新的宽数据框,其中包含基于两列中所有值组合的行?

[英]How can I create a new wide data frame with rows based on all combos of values in two columns?

I have the following dataframe (dput is provided at the bottom of the question):我有以下数据框(问题底部提供了dput):

>df_input
# A tibble: 5 x 4
  category range        samples events
  <chr>    <chr>          <dbl>  <dbl>
1 GroupA   Apr2002         4951  97796
2 GroupA   May2002         9332 195726
3 GroupB   Apr2001         4781  80767
4 GroupB   Oct2001         5677  92890
5 GroupB   OctToNov2001   10296 166037

I would like to create a new dataframe with rows that are a combination of each unique combination of both the category and range columns.我想创建一个新的数据框,其中的行是categoryrange列的每个唯一组合的组合。 For example, category = GroupA and range = Apr2002 would have 3 rows in the output dataframe for each of the three category = Group B rows.例如, category = GroupA GroupA 和range = Apr2002将在输出数据框中为三个category = Group B行中的每一行提供 3 行。

The range column in the input dataframe will always have unique values only.输入数据框中的range列将始终只有唯一值。

I would also like to rename the combined output columns for events , samples and range to include the Group names (ie range_GroupA , range_GroupB , samples_GroupA , events_GroupA , samples_GroupB , events_GroupB )我还想重命名eventssamplesrange的组合输出列以包含Group名称(即range_GroupArange_GroupBsamples_GroupAevents_GroupAsamples_GroupBevents_GroupB

I'm struggling with how to create my combined rows from the category column.我正在努力解决如何从category列创建组合行。 I'm also struggling to find the right search terms here to find similar questions/answers.我也在努力在这里找到正确的搜索词来找到类似的问题/答案。 The closest I've managed to find so far is Create new rows in data frame based on multiple values of column , but the combo in that question is a bit different that what I'm attempting.到目前为止,我设法找到的最接近的是Create new rows in data frame based on multiple values of column ,但是该问题中的组合与我尝试的有点不同。

The desired output dataframe is:所需的输出数据帧是:

> df_output
# A tibble: 6 x 6
  range_GroupA range_GroupB samples_GroupA events_GroupA samples_GroupB events_GroupB
  <chr>        <chr>                 <dbl>         <dbl>          <dbl>         <dbl>
1 Apr2002      Apr2001                4951         97796           4781         80767
2 Apr2002      Oct2001                4951         97796           5677         92890
3 Apr2002      OctToNov2001           4951         97796          10296        166037
4 May2002      Apr2001                9332        195726           4781         80767
5 May2022      Oct2001                9332        195726           5677         92890
6 May2022      OctToNov2001           9332        195726          10296        166037

df_input dataframe: df_input 数据框:

df_input <- structure(list(category = c("GroupA", "GroupA", "GroupB", "GroupB", 
                                        "GroupB"), range = c("Apr2002", "May2002", "Apr2001", "Oct2001", 
                                                             "OctToNov2001"), samples = c(4951, 9332, 4781, 5677, 10296), 
                           events = c(97796, 195726, 80767, 92890, 166037)), row.names = c(NA, 
                                                                                           -5L), class = c("tbl_df", "tbl", "data.frame"))

df_output dataframe df_output 数据帧

df_output <- structure(list(range_GroupA = c("Apr2002", "Apr2002", "Apr2002", 
                                             "May2002", "May2022", "May2022"), range_GroupB = c("Apr2001", 
                                                                                                "Oct2001", "OctToNov2001", "Apr2001", "Oct2001", "OctToNov2001"
                                             ), samples_GroupA = c(4951, 4951, 4951, 9332, 9332, 9332), events_GroupA = c(97796, 
                                                                                                                          97796, 97796, 195726, 195726, 195726), samples_GroupB = c(4781, 
                                                                                                                                                                                    5677, 10296, 4781, 5677, 10296), events_GroupB = c(80767, 92890, 
                                                                                                                                                                                                                                       166037, 80767, 92890, 166037)), row.names = c(NA, -6L), class = c("tbl_df", 
                                                                                                                                                                                                                                                                                                         "tbl", "data.frame"))

I think we can get your result with a filtered cartesian join:我认为我们可以通过过滤笛卡尔连接获得您的结果:

library(dplyr)
left_join(
  df_input %>% mutate(dummy = 1),
  df_input %>% mutate(dummy = 1), by = "dummy") %>%
  filter(category.x < category.y)

You'll recognize all the numbers you're looking for, but with different header names.您将识别所有要查找的数字,但标题名称不同。 We can rename them manually, but that's no fun.我们可以手动重命名它们,但这并不好玩。 See below for renamed version.请参阅下面的重命名版本。

# A tibble: 6 × 9
  category.x range.x samples.x events.x dummy category.y range.y      samples.y events.y
  <chr>      <chr>       <dbl>    <dbl> <dbl> <chr>      <chr>            <dbl>    <dbl>
1 GroupA     Apr2002      4951    97796     1 GroupB     Apr2001           4781    80767
2 GroupA     Apr2002      4951    97796     1 GroupB     Oct2001           5677    92890
3 GroupA     Apr2002      4951    97796     1 GroupB     OctToNov2001     10296   166037
4 GroupA     May2002      9332   195726     1 GroupB     Apr2001           4781    80767
5 GroupA     May2002      9332   195726     1 GroupB     Oct2001           5677    92890
6 GroupA     May2002      9332   195726     1 GroupB     OctToNov2001     10296   166037

EDIT: This seems to do it with the renaming:编辑:这似乎与重命名有关:

left_join(
  df_input %>% rename_with(~paste0(.,"_GroupA")) %>% mutate(dummy = 1),
  df_input %>% rename_with(~paste0(.,"_GroupB")) %>% mutate(dummy = 1), 
  by = "dummy") %>%
  filter(category_GroupA < category_GroupB) %>%
  select(-category_GroupA, -dummy, -category_GroupB) %>%
  relocate(range_GroupB, .after = 1)


# A tibble: 6 × 6
  range_GroupA range_GroupB samples_GroupA events_GroupA samples_GroupB events_GroupB
  <chr>        <chr>                 <dbl>         <dbl>          <dbl>         <dbl>
1 Apr2002      Apr2001                4951         97796           4781         80767
2 Apr2002      Oct2001                4951         97796           5677         92890
3 Apr2002      OctToNov2001           4951         97796          10296        166037
4 May2002      Apr2001                9332        195726           4781         80767
5 May2002      Oct2001                9332        195726           5677         92890
6 May2002      OctToNov2001           9332        195726          10296        166037

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据另外两列的值创建一个新的数据框列 - Create a new data frame column based on the values of two other columns 如何根据匹配的字段值比较两个数据框列之间的值? - How can I compare values between two data frame columns based on matching field values? 如何使用mutate仅根据数据框其他行的子集创建新列? - How can I use mutate to create a new column based only on a subset of other rows of a data frame? 如何根据字符串列为每个观察创建一个包含多行的新数据框? - How can I create a new data frame with several rows for each observation based on string column? R:如何根据数据框的值添加行? - R: How can I add rows based on values of a data frame? 如何将两行汇总到数据框中的新行中 - How can I summarize two rows into a new one in a data frame 如何通过汇总行在数据框中创建新列? - How can I create new column in data frame by aggregating rows? 根据 R 中数据框中所有其他列中的字符串值,使用 dplyr 创建一个新列 - Create a new column using dplyr based on string values in all other columns in a data frame in R 根据列的多个值在数据框中创建新行 - Create new rows in data frame based on multiple values of column 如何根据 R 中的另一个数据框重命名数据框的所有列? - How can I rename all columns of a data frame based on another data frame in R?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM