简体   繁体   中英

Groupby one column if string contained and get maximum values of another column in R

Given a dataframe as follows:

df <- structure(list(city = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 1L, 
1L), .Label = c("bj", "sh"), class = "factor"), type = structure(c(3L, 
1L, 3L, 1L, 4L, 2L, 4L, 2L), .Label = c("buy_area", "buy_price", 
"sale_area", "sale_price"), class = "factor"), value = c(1200L, 
800L, 1900L, 1500L, 15L, 10L, 17L, 9L)), class = "data.frame", row.names = c(NA, 
-8L))

Out:

在此处输入图片说明

How could I obtain maximum values from value column for 2 types of type : area and price contained respectively.

The expected results will be two values: 1900 for area and 17 for price .

To groupby type and get maximum of value s we can use:

ddply(df, .(variable), summarise, max.value = max(value))

Update: output of @det's solution:

在此处输入图片说明

Create column that categorize type into area or price and group by that column:

df %>%
  mutate(
    type2 = case_when(
      str_detect(type, "_area$") ~ "area",
      str_detect(type, "_price$") ~ "price",
      TRUE ~ NA_character_
    )
  ) %>%
  group_by(type2) %>%
  summarise(max_value = max(value))

output:

  type2 max_value
  <chr>     <int>
1 area       1900
2 price        17

Update: This one is more concise (it is a small modification of Ronak Shah's answer:

df %>% 
    separate(type, c("sale_buy", "area_price")) %>% 
    group_by(area_price) %>% 
    summarise(max = max(value))

Output:

  area_price   max
  <chr>      <int>
1 area        1900
2 price         17

First answer: one way could be:

library(dplyr)
df %>% 
    group_by(type) %>% 
    summarise(max = max(value)) %>% 
    filter(grepl("sale", type))

Output:

  type         max
  <fct>      <int>
1 sale_area   1900
2 sale_price    17

Divide the type column in two columns and find max by group.

library(dplyr)
library(tidyr)

df %>%
  separate(type, c('type', 'col'), sep = '_') %>%
  group_by(col) %>%
  summarise(value = max(value, na.rm = TRUE))

#  col   value
#  <chr> <int>
#1 area   1900
#2 price    17

You can also extract 'area' or 'price' from type and use it as grouping column.

df %>%
  group_by(type = stringr::str_extract(type, 'area|price')) %>%
  summarise(value = max(value, na.rm = TRUE))

试试这个:

df %>% separate(type,c("type","area")) %>% group_by(area) %>% filter(value == max(value,na.rm = TRUE))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM