简体   繁体   English

从列到行处理R中的数据

[英]Manipulating data in R from columns to rows

I have data that is currently organized as follows: 我目前整理的数据如下:

 X.1 State MN    X.2    WI    X.3
     NA    Price Pounds Price Pounds
Year NA    
1980 NA    56    23     56    96
1999 NA    41    63     56    65

I would like to convert it to something more like this: 我想将其转换为以下形式:

Year State Price Pounds
1980 MN    56    23
1999 MN    41    63
1980 WI    56    96
1999 WI    56    65

Any suggestions for some R-code to manipulate this data correctly? 对一些R代码正确处理此数据有何建议? Thanks! 谢谢!

This requires some manipulation to get it into a format that you can reshape. 这需要进行一些操作才能使其变为可以重塑的格式。

df <- read.table(h=T, t=" X.1 State MN    X.2    WI    X.3
NA     NA    Price Pounds Price Pounds
Year NA    NA    NA     NA    NA
1980 NA    56    23     56    96
1999 NA    41    63     56    65")

df <- df[-2]

# Auto-process names; you should look at intermediate step results to see
# what's going on.  This would probably be better addressed with something
# like `na.locf` from `zoo` but this is all in base.  Note you can do something
# a fair bit simpler if you know you have the same number of items for each
# state, but this should be robust to different numbers.

df.names <- names(df)
df.names <- ifelse(grepl("X.[0-9]+", df.names), NA, df.names)
df.names[[1]] <- "Year"
df.names.valid <- Filter(Negate(is.na), df.names)
df.names[is.na(df.names)] <- df.names.valid[cumsum(!is.na(df.names))[is.na(df.names)]]
names(df) <- df.names

# rename again by adding Price/Pounds

names(df)[-1] <- paste(                                
  vapply(2:5, function(x) as.character(df[1, x]), ""), # need to do this because we're pulling across different factor columns
  names(df)[-1], 
  sep="."
)
df <- df[-(1:2),]   # Don't need rows 1:2 anymore
df

Produces: 生产:

  Year Price.MN Pounds.MN Price.WI Pounds.WI
3 1980       56        23       56        96
4 1999       41        63       56        65

Then: 然后:

using base reshape : 使用基础reshape

reshape(df, direction="long", varying=2:5)

Which gets you basically where you want to be: 基本上可以让您到达想要的位置:

     Year time Price Pounds id
1.MN 1980   MN    56     23  1
2.MN 1999   MN    41     63  2
1.WI 1980   WI    56     96  1
2.WI 1999   WI    56     65  2

Clearly you'll want to rename some columns, etc., but that's straightforward. 显然,您需要重命名某些列等,但这很简单。 The key point with reshape is that the column names matter so we constructed them in a way that reshape can use. reshape的关键是列名很重要,因此我们以可reshape的方式构造它们。

using reshape2::melt/cast : 使用reshape2::melt/cast

library(reshape2)
df.mlt <- melt(df, id.vars="Year")
df.mlt <- transform(df.mlt, 
  metric=sub("\\..*", "", variable), 
  state=sub(".*\\.", "", variable)
)
dcast(df.mlt[-2], Year + state ~ metric)

produces: 生产:

  Year state Pounds Price
1 1980    MN     23    56
2 1980    WI     96    56
3 1999    MN     63    41
4 1999    WI     65    56

BE VERY CAREFUL, it is likely that Price and Pounds are factors because the column used to have both character and numeric values. 务必小心, PricePounds很可能是因素,因为该列以前同时具有字符和数字值。 You will need to convert to numeric with as.numeric(as.character(df$Price)) . 您将需要使用as.numeric(as.character(df$Price))转换为数字。

Well that was a nice challenge. 好吧,这是一个不错的挑战。 It's a lot of strsplit s and grep s, and it may not generalize to your entire data set. 它有很多strsplitgrep ,并且可能不能推广到整个数据集。 Or maybe it will, you never know. 也许会,你永远不会知道。

> txt <- "X.1 State MN    X.2    WI    X.3
  NA    Price Pounds Price Pounds
  Year NA
  1980 NA    56    23     56    96
  1999 NA    41    63     56    65"
> 
> x <- textConnection(txt)
> y <- gsub("((X[.][0-9]{1})|NA)|\\s+", " ", readLines(x))
> z <- unlist(strsplit(y, "^\\s+"))
> a <- z[nzchar(z)]
> b <- unlist(strsplit(a, "\\s+"))
> nums <- as.numeric(grep("[0-9]", b[nchar(b) == 2], value = TRUE))
> Price = rev(nums[c(TRUE, FALSE)])
> pounds <- nums[-which(nums %in% Price)]
> data.frame(Year = rep(b[grepl("[0-9]{4}", b)], 2),
             State = unlist(lapply(b[grepl("[A-Z]{2}", b)], rep, 2)),
             Price = Price,
             Pounds = c(pounds[1], rev(pounds[2:3]), pounds[4]))
##   Year State Price Pounds
## 1 1980    MN    56     23
## 2 1999    MN    41     63
## 3 1980    WI    56     96
## 4 1999    WI    56     65

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM