[英]Sorting multiple columns by first letter and by numbers in R
I have created a dataframe that looks like the following: 我创建了一个如下所示的数据框:
item mean
a_b 5
a_c 2
a_a 4
b_d 7
b_f 3
b_e 1
I would like to sort it so that it is first sorted by whether or not it begins with "a_" or "b_", and then have it sorted by mean. 我想对它进行排序,以便首先根据它是否以“a_”或“b_”开头,然后按平均值排序。 The final dataframe should look like this: 最终的数据框应如下所示:
item mean
a_c 2
a_a 4
a_b 5
b_e 1
b_f 3
b_d 7
Note that the item column is not sorted perfectly alphabetically. 请注意,项目列未按字母顺序排序。 It is only sorted by the first letter. 它只按第一个字母排序。
I have tried: 我努力了:
arrange(df, item, mean)
The problem with this is that it does not only sort by the "a_" and "b_" categories, but by the entire item name. 这样做的问题在于它不仅按“a_”和“b_”类别排序,而且按整个项目名称排序。
I am open to separating the original dataframe into separate dataframes using filter and then sorting the mean within these smaller subsets. 我愿意使用过滤器将原始数据帧分离为单独的数据帧,然后在这些较小的子集中对平均值进行排序。 I do not need everything to stay in the same dataframe. 我不需要所有东西都保持在同一个数据帧中。 However, I am unsure how to use filter to only select rows that have items beginning with "a_" or "b_". 但是,我不确定如何使用过滤器仅选择包含以“a_”或“b_”开头的项目的行。
Another method using dplyr
: 使用dplyr
另一种方法:
library(dplyr)
arrange(df, sub('_.+$', '', item), mean)
an alternative would be to use str_extract
from stringr
to extract only the first letter from item
: 另一种是使用str_extract
从stringr
从只提取第一个字母item
:
library(stringr)
arrange(df, str_extract(item, '^._'), mean)
Result: 结果:
item mean
1 a_c 2
2 a_a 4
3 a_b 5
4 b_e 1
5 b_f 3
6 b_d 7
Data: 数据:
df <- structure(list(item = c("a_b", "a_c", "a_a", "b_d", "b_f", "b_e"
), mean = c(5L, 2L, 4L, 7L, 3L, 1L)), .Names = c("item", "mean"
), class = "data.frame", row.names = c(NA, -6L))
Notes: 笔记:
sub('_.+$', '', item)
creates a temporary variable by removing _
and everything after that from item
. sub('_.+$', '', item)
通过从item
删除 _
及其后的所有内容来创建临时变量。 _.+$
matches a literal underscore ( _
) followed by any character one or more times ( .+
) at the end of the string ( $
). _.+$
匹配文字下划线( _
),后跟字符串末尾( $
)的任何字符一次或多次( .+
)。
str_extract(item, '^._')
creates a temporary variable by extracting any one character ( .
) followed by a literal underscore ( _
) in the beginning of the string ( ^
) str_extract(item, '^._')
通过在字符串的开头提取任意一个字符( .
)后跟一个文字下划线( _
)来创建一个临时变量( ^
)
The neat thing about dplyr::arrange
is that you can create a temporary sorting variable within the function and not have it included in the output. 关于dplyr::arrange
是你可以在函数中创建一个临时的排序变量,而不是将它包含在输出中。
The philosophy is that if you want to arrange
by something (ie a substring here) you have to obtain it first: 理念是,如果你想通过某种东西(即这里的子串)进行arrange
,你必须首先获得它:
df = read.table(text = "
item mean
a_b 5
a_c 2
a_a 4
b_d 7
b_f 3
b_e 1
", header=T, stringsAsFactors=F)
library(tidyverse)
df %>%
separate(item, c("item1","item2"), remove = F) %>% # split items while keeping the original column
arrange(item1, mean) %>% # arrange by what you really want
select(item, mean) # keep only relevant columns
# item mean
# 1 a_c 2
# 2 a_a 4
# 3 a_b 5
# 4 b_e 1
# 5 b_f 3
# 6 b_d 7
Note that there are various ways to pick the 1st letter from a string. 请注意,有多种方法可以从字符串中选择第一个字母。 I just decided to use separate
here. 我刚决定在这里separate
使用。
In case you have many items separated by _
you'll still need to extract the first item, so you can replace the first _
with another delimiter (let's say :
) and separate your column on that: 如果您有相隔许多项目_
你仍然需要提取的第一个项目,这样你就可以取代第一_
与另一个分隔符(比方说:
)和上分离色谱柱:
df = read.table(text = "
item mean
a_b_m 5
a_c 2
a_a 4
b_d_x_q 7
b_f 3
b_e 1
", header=T, stringsAsFactors=F)
library(tidyverse)
library(stringr)
df %>%
mutate(item2 = str_replace(item, "_", ":")) %>%
separate(item2, c("item1","item2"), remove = F, sep = ":") %>%
arrange(item1, mean) %>%
select(item, mean)
# item mean
# 1 a_c 2
# 2 a_a 4
# 3 a_b_m 5
# 4 b_e 1
# 5 b_f 3
# 6 b_d_x_q 7
A base R solution would be 基础R解决方案将是
inx <- order(substr(df$item, 1, 1), df$mean)
newdf <- df[inx, ]
newdf
# item mean
#2 a_c 2
#3 a_a 4
#1 a_b 5
#6 b_e 1
#5 b_f 3
#4 b_d 7
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.