按首字母和R中的数字对多列进行排序

Question

I have created a dataframe that looks like the following: 我创建了一个如下所示的数据框：

item  mean
a_b   5
a_c   2
a_a   4
b_d   7
b_f   3
b_e   1

I would like to sort it so that it is first sorted by whether or not it begins with "a_" or "b_", and then have it sorted by mean. 我想对它进行排序，以便首先根据它是否以“a_”或“b_”开头，然后按平均值排序。 The final dataframe should look like this: 最终的数据框应如下所示：

item  mean
a_c   2
a_a   4
a_b   5
b_e   1
b_f   3
b_d   7

Note that the item column is not sorted perfectly alphabetically. 请注意，项目列未按字母顺序排序。 It is only sorted by the first letter. 它只按第一个字母排序。

I have tried: 我努力了：

arrange(df, item, mean)

The problem with this is that it does not only sort by the "a_" and "b_" categories, but by the entire item name. 这样做的问题在于它不仅按“a_”和“b_”类别排序，而且按整个项目名称排序。

I am open to separating the original dataframe into separate dataframes using filter and then sorting the mean within these smaller subsets. 我愿意使用过滤器将原始数据帧分离为单独的数据帧，然后在这些较小的子集中对平均值进行排序。 I do not need everything to stay in the same dataframe. 我不需要所有东西都保持在同一个数据帧中。 However, I am unsure how to use filter to only select rows that have items beginning with "a_" or "b_". 但是，我不确定如何使用过滤器仅选择包含以“a_”或“b_”开头的项目的行。

Answer 1

Another method using dplyr : 使用dplyr另一种方法：

library(dplyr)
arrange(df, sub('_.+$', '', item), mean)

an alternative would be to use str_extract from stringr to extract only the first letter from item : 另一种是使用str_extract从stringr从只提取第一个字母item ：

library(stringr)
arrange(df, str_extract(item, '^._'), mean)

Result: 结果：

  item mean
1  a_c    2
2  a_a    4
3  a_b    5
4  b_e    1
5  b_f    3
6  b_d    7

Data: 数据：

df <- structure(list(item = c("a_b", "a_c", "a_a", "b_d", "b_f", "b_e"
), mean = c(5L, 2L, 4L, 7L, 3L, 1L)), .Names = c("item", "mean"
), class = "data.frame", row.names = c(NA, -6L))

Notes: 笔记：

sub('_.+$', '', item) creates a temporary variable by removing _ and everything after that from item . sub('_.+$', '', item)通过从item 删除 _及其后的所有内容来创建临时变量。 _.+$ matches a literal underscore ( _ ) followed by any character one or more times ( .+ ) at the end of the string ( $ ). _.+$匹配文字下划线（ _ ），后跟字符串末尾（ $ ）的任何字符一次或多次（ .+ ）。
str_extract(item, '^._') creates a temporary variable by extracting any one character ( . ) followed by a literal underscore ( _ ) in the beginning of the string ( ^ ) str_extract(item, '^._')通过在字符串的开头提取任意一个字符（ . ）后跟一个文字下划线（ _ ）来创建一个临时变量（ ^ ）
The neat thing about dplyr::arrange is that you can create a temporary sorting variable within the function and not have it included in the output. 关于dplyr::arrange是你可以在函数中创建一个临时的排序变量，而不是将它包含在输出中。

Answer 2

The philosophy is that if you want to arrange by something (ie a substring here) you have to obtain it first: 理念是，如果你想通过某种东西（即这里的子串）进行arrange ，你必须首先获得它：

df = read.table(text = "
item  mean
a_b   5
a_c   2
a_a   4
b_d   7
b_f   3
b_e   1
", header=T, stringsAsFactors=F)

library(tidyverse)

df %>%
  separate(item, c("item1","item2"), remove = F) %>% # split items while keeping the original column
  arrange(item1, mean) %>%                           # arrange by what you really want
  select(item, mean)                                 # keep only relevant columns

#   item mean
# 1  a_c    2
# 2  a_a    4
# 3  a_b    5
# 4  b_e    1
# 5  b_f    3
# 6  b_d    7

Note that there are various ways to pick the 1st letter from a string. 请注意，有多种方法可以从字符串中选择第一个字母。 I just decided to use separate here. 我刚决定在这里separate使用。

In case you have many items separated by _ you'll still need to extract the first item, so you can replace the first _ with another delimiter (let's say : ) and separate your column on that: 如果您有相隔许多项目_你仍然需要提取的第一个项目，这样你就可以取代第一_与另一个分隔符（比方说: ）和上分离色谱柱：

df = read.table(text = "
item  mean
a_b_m   5
a_c   2
a_a   4
b_d_x_q   7
b_f   3
b_e   1
", header=T, stringsAsFactors=F)

library(tidyverse)
library(stringr)

df %>%
  mutate(item2 = str_replace(item, "_", ":")) %>%
  separate(item2, c("item1","item2"), remove = F, sep = ":") %>% 
  arrange(item1, mean) %>%                           
  select(item, mean) 

#      item mean
# 1     a_c    2
# 2     a_a    4
# 3   a_b_m    5
# 4     b_e    1
# 5     b_f    3
# 6 b_d_x_q    7

Answer 3

A base R solution would be 基础R解决方案将是

inx <- order(substr(df$item, 1, 1), df$mean)
newdf <- df[inx, ]

newdf
#  item mean
#2  a_c    2
#3  a_a    4
#1  a_b    5
#6  b_e    1
#5  b_f    3
#4  b_d    7

按首字母和R中的数字对多列进行排序

问题描述

3 个解决方案

解决方案1
2 已采纳 2018-08-07 17:25:07

解决方案2
1 2018-08-07 17:20:14

解决方案3
0 2018-08-07 17:23:54

按首字母和R中的数字对多列进行排序

问题描述

3 个解决方案

解决方案1 2 已采纳 2018-08-07 17:25:07

解决方案2 1 2018-08-07 17:20:14

解决方案3 0 2018-08-07 17:23:54

解决方案1
2 已采纳 2018-08-07 17:25:07

解决方案2
1 2018-08-07 17:20:14

解决方案3
0 2018-08-07 17:23:54