[英]Manipulating data in R from columns to rows
I have data that is currently organized as follows: 我目前整理的数据如下:
X.1 State MN X.2 WI X.3
NA Price Pounds Price Pounds
Year NA
1980 NA 56 23 56 96
1999 NA 41 63 56 65
I would like to convert it to something more like this: 我想将其转换为以下形式:
Year State Price Pounds
1980 MN 56 23
1999 MN 41 63
1980 WI 56 96
1999 WI 56 65
Any suggestions for some R-code to manipulate this data correctly? 对一些R代码正确处理此数据有何建议? Thanks! 谢谢!
This requires some manipulation to get it into a format that you can reshape. 这需要进行一些操作才能使其变为可以重塑的格式。
df <- read.table(h=T, t=" X.1 State MN X.2 WI X.3
NA NA Price Pounds Price Pounds
Year NA NA NA NA NA
1980 NA 56 23 56 96
1999 NA 41 63 56 65")
df <- df[-2]
# Auto-process names; you should look at intermediate step results to see
# what's going on. This would probably be better addressed with something
# like `na.locf` from `zoo` but this is all in base. Note you can do something
# a fair bit simpler if you know you have the same number of items for each
# state, but this should be robust to different numbers.
df.names <- names(df)
df.names <- ifelse(grepl("X.[0-9]+", df.names), NA, df.names)
df.names[[1]] <- "Year"
df.names.valid <- Filter(Negate(is.na), df.names)
df.names[is.na(df.names)] <- df.names.valid[cumsum(!is.na(df.names))[is.na(df.names)]]
names(df) <- df.names
# rename again by adding Price/Pounds
names(df)[-1] <- paste(
vapply(2:5, function(x) as.character(df[1, x]), ""), # need to do this because we're pulling across different factor columns
names(df)[-1],
sep="."
)
df <- df[-(1:2),] # Don't need rows 1:2 anymore
df
Produces: 生产:
Year Price.MN Pounds.MN Price.WI Pounds.WI
3 1980 56 23 56 96
4 1999 41 63 56 65
Then: 然后:
reshape
: 使用基础reshape
: reshape(df, direction="long", varying=2:5)
Which gets you basically where you want to be: 基本上可以让您到达想要的位置:
Year time Price Pounds id
1.MN 1980 MN 56 23 1
2.MN 1999 MN 41 63 2
1.WI 1980 WI 56 96 1
2.WI 1999 WI 56 65 2
Clearly you'll want to rename some columns, etc., but that's straightforward. 显然,您需要重命名某些列等,但这很简单。 The key point with reshape
is that the column names matter so we constructed them in a way that reshape
can use. reshape
的关键是列名很重要,因此我们以可reshape
的方式构造它们。
reshape2::melt/cast
: 使用reshape2::melt/cast
: library(reshape2)
df.mlt <- melt(df, id.vars="Year")
df.mlt <- transform(df.mlt,
metric=sub("\\..*", "", variable),
state=sub(".*\\.", "", variable)
)
dcast(df.mlt[-2], Year + state ~ metric)
produces: 生产:
Year state Pounds Price
1 1980 MN 23 56
2 1980 WI 96 56
3 1999 MN 63 41
4 1999 WI 65 56
BE VERY CAREFUL, it is likely that Price
and Pounds
are factors because the column used to have both character and numeric values. 务必小心, Price
和Pounds
很可能是因素,因为该列以前同时具有字符和数字值。 You will need to convert to numeric with as.numeric(as.character(df$Price))
. 您将需要使用as.numeric(as.character(df$Price))
转换为数字。
Well that was a nice challenge. 好吧,这是一个不错的挑战。 It's a lot of strsplit
s and grep
s, and it may not generalize to your entire data set. 它有很多strsplit
和grep
,并且可能不能推广到整个数据集。 Or maybe it will, you never know. 也许会,你永远不会知道。
> txt <- "X.1 State MN X.2 WI X.3
NA Price Pounds Price Pounds
Year NA
1980 NA 56 23 56 96
1999 NA 41 63 56 65"
>
> x <- textConnection(txt)
> y <- gsub("((X[.][0-9]{1})|NA)|\\s+", " ", readLines(x))
> z <- unlist(strsplit(y, "^\\s+"))
> a <- z[nzchar(z)]
> b <- unlist(strsplit(a, "\\s+"))
> nums <- as.numeric(grep("[0-9]", b[nchar(b) == 2], value = TRUE))
> Price = rev(nums[c(TRUE, FALSE)])
> pounds <- nums[-which(nums %in% Price)]
> data.frame(Year = rep(b[grepl("[0-9]{4}", b)], 2),
State = unlist(lapply(b[grepl("[A-Z]{2}", b)], rep, 2)),
Price = Price,
Pounds = c(pounds[1], rev(pounds[2:3]), pounds[4]))
## Year State Price Pounds
## 1 1980 MN 56 23
## 2 1999 MN 41 63
## 3 1980 WI 56 96
## 4 1999 WI 56 65
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.