简体   繁体   English

根据 R 中该列的条件值分配该列的行值

[英]Assign row values of a column based on conditional values of that column in R

I have a data frame in which the category name and it's labels are in the same column.我有一个数据框,其中类别名称及其标签位于同一列中。 The category names are in all caps, the labels are only first letter capitalized.类别名称全部大写,标签仅首字母大写。

In its simplest form a sample data frame can be created:可以以最简单的形式创建示例数据框:

xdata <- tibble(category_and_label=c('CATEGORY1','Name1','Name2','Name3','CATEGORY2','Name1','Name2','Name4'),
                values            =c(NA, 2,3,4,NA,5,6,7))

and it looks like它看起来像

  category_and_label values
  <chr>               <dbl>
1 CATEGORY1              NA
2 Name1                   2
3 Name2                   3
4 Name3                   4
5 CATEGORY2              NA
6 Name1                   5
7 Name2                   6
8 Name4                   7

I need to have the category name and label in separate columns.我需要将类别名称和标签放在单独的列中。 The correctly modified df is:正确修改的df是:

  category  label values
  <chr>     <chr>  <dbl>
1 CATEGORY1 Name1      2
2 CATEGORY1 Name2      3
3 CATEGORY1 Name3      4
4 CATEGORY2 Name1      5
5 CATEGORY2 Name2      6
6 CATEGORY2 Name4      7

I can only conceive of the first part of the solution.我只能设想解决方案的第一部分。 It makes sense to me to create a column to label the category names.创建一个列来标记类别名称对我来说很有意义。

xdata <- xdata %>% mutate(allcaps=if_else(str_detect(category_and_label,'[A-Z]{3,}'),1,0))
category_and_label values allcaps
  <chr>               <dbl>   <dbl>
1 CATEGORY1              NA       1
2 Name1                   2       0
3 Name2                   3       0
4 Name3                   4       0
5 CATEGORY2              NA       1
6 Name1                   5       0
7 Name2                   6       0
8 Name4                   7       0

This identifies the category names.这标识了类别名称。 Using dplyr, how would I assign the identified category names to new column with row values equal to the category name but only until the next category name is met.使用 dplyr,我将如何将标识的类别名称分配给行值等于类别名称的新列,但直到满足下一个类别名称。

I've tried a few ideas but none are worth showing.我尝试了一些想法,但没有一个值得展示。

I can use rename() to rename columns once the category and names are separated.一旦类别和名称分开,我就可以使用 rename() 重命名列。

One way would be to cumsum() a boolean variable.一种方法是cumsum()一个布尔变量。 In your example, the first row of each category has values = NA ;在您的示例中,每个类别的第一行都有values = NA ; if that applies in general, then the following code might be what you want:如果这普遍适用,那么以下代码可能是您想要的:

xdata %>% 
    mutate(
        category = cumsum(is.na(values))
    ) %>% 
    filter(!is.na(values)) %>% 
    rename(label = category_and_label)

yields产量

# A tibble: 6 × 3
  label values category
  <chr>  <dbl>    <int>
1 Name1      2        1
2 Name2      3        1
3 Name3      4        1
4 Name1      5        2
5 Name2      6        2
6 Name4      7        2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM