[英]get a particular row in R to a new table
i am new to R so please guide me with this. 我是R的新手,所以请指导我。
Below shown is a simple table called Order
. 下面显示的是一个简单的表,称为
Order
。
Col1 Col2 Col3
hey hi july 12,2013
hey hi june 12,2013
hey hi April 12,2013
hey hi April 14,2012
If i want to write a query such that i get this as result in a new table ie. 如果我想写一个查询,这样我得到一个新表即结果。 i need to use regular expression to match for a part of string in
Col3
and then count. 我需要使用正则表达式来匹配
Col3
的一部分字符串,然后计数。
july june April
1 1 2
please help me if anyone knows how to do it. 如果有人知道该怎么做,请帮助我。
You can use sub
to extract the months' names and table
to count the frequencies: 您可以使用
sub
来提取月份的名称和table
来计算频率:
dat <- read.table(text = "Col1 Col2 Col3
hey hi 'july 12,2013'
hey hi 'june 12,2013'
hey hi 'April 12,2013'
hey hi 'April 14,2012'", header = TRUE)
table(sub("^(\\w+) .*", "\\1", dat$Col3))
# April july june
# 2 1 1
How does sub("^(\\\\w+) .*", "\\\\1", dat$Col3)
work? sub("^(\\\\w+) .*", "\\\\1", dat$Col3)
工作?
The function sub
performs replacements in strings. function
sub
执行字符串替换。 The strings inside quotes are regular expressions. 引号内的字符串是正则表达式。
^
is the beginning of the string, \\\\w
is a word character, +
means one or multiple. ^
是字符串的开头, \\\\w
是单词字符, +
表示一个或多个。 is a literal space.
是一个文字空间。
.*
means any number of any character. .*
表示任意数量的任何字符。 The parentheses are used to create a group. 括号用于创建组。 The first (and only) group
(\\\\w+)
matches word characters at the beginning of the string. 第一个(也是唯一一个)组
(\\\\w+)
与字符串开头的单词字符匹配。 The second argument in the sub
function, "\\\\1"
is used to replace the whole string with the substring representing the first group. sub
的第二个参数"\\\\1"
用于将整个字符串替换为代表第一组的子字符串。 In short: the whole string is replaced by the first word. 简而言之:整个字符串被第一个单词替换。
Data: 数据:
data <- read.table(text = "Col1 Col2 Col3
hey hi 'july 12,2013'
hey hi 'june 12,2013'
hey hi 'April 12,2013'
hey hi 'April 14,2012'", header = TRUE)
An answer using dates: 使用日期的答案:
#tranform data in POSIXlt
data$Col3 <- as.POSIXlt(data$Col3, format="%B %d, %Y")
## group using table with POSIXlt numbers (0 is january)
table(data$Col3$mon)
3 5 6
2 1 1
## group using table with normal month numbers
table(month(data$Col3))
4 6 7
2 1 1
## group using aggregate with POSIXlt numbers (0 is january)
aggregate(data$Col1, by=list(data[,"Col3"]$mon), length)
#result
Group.1 x
1 3 2
2 5 1
3 6 1
## group using aggregate with normal month numbers
aggregate(data$Col1, by=list(month(data$Col3)), length)
#result
Group.1 x
1 4 2
2 6 1
3 7 1
PS: whe you get data$Col3$mon in POSIXlt january is 0, so april is 3 and not 4 as you would expect. PS:如果您在一月份的POSIXlt中获得data $ Col3 $ mon为0,那么4月为3,而不是您期望的4。 To get "normal" month numbers you should use month(data$Col3) - just realised that reading Ananda's comment.
要获得“正常”月份数,您应该使用month(data $ Col3)-刚意识到要阅读Ananda的注释。
If you want a prettier version (by Ananda Mahto): 如果您想要一个更漂亮的版本(由Ananda Mahto撰写):
Col3 <- as.POSIXlt(data$Col3, format="%B %d, %Y"); table(month.name[month(Col3)])
April July June
2 1 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.