I have a data frame with columns Price and Material, and a true/false matrix with N columns (each column is a particular type of material), and the T/F value denotes if the 'material' string appears in the data matrix
Data
Price Material
2.33 Metal nickel linen cotton
3.45 silver emerald steel
7.45 cotton silk wood
Matrix
Metal Nickel Linen Cotton Silver Emerald Steel Cotton Silk Wood
T T T T 0 0 0 0 0 0
0 0 0 0 T T T 0 0 0
...etc.
How do I create a subset the prices based on the material? So I can calculate the mean, range mode etc of prices which have the material 'Metal'.
My initial solution was to multiply the
newMat<- data$price * materialmatrix.
and then perform column operations on newMat (mean, quantile etc.)
But this seems like a brutal way of doing things, as I would like to combine the subsets (eg average price for something with Metal && Cotton).
I also tried
split(data, wsearch, drop=TRUE)
but got the warning.
Warning message:
In split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...)
data length is not a multiple of split variable
Was trying to use lapply
, split
, ddply
, and subset
, but my understanding of R is not strong enough to execute.
I know this is probably very simple, but I am stuck on how to use the matrix to create multiple subsets, rather than create one subset at a time.
Any help would be great.
I have looked at the following
Subsetting a data.frame with an integer matrix
subsetting matrix with id from another matrix
Select observations from a subset to create a new subset based on a large dataframe in R
R Selecting column in a data frame by column in another data frame
Is this what you want?
library(reshape2)
library(splitstackshape)
# sample data
df <- data.frame(price = c(17, 35, 12, 26, 1.35, 10),
material = c("linen",
"wax string metal cube",
"Metal nickel linen cotton",
"brass",
"linen",
"cotton silk wood"))
# split the concatenated material variable
df2 <- concat.split(data = df, split.col = "material", sep = " ", drop = TRUE)
# replace blanks with NA
df2[df2 == ""] <- NA
# melt data to long format
df3 <- melt(df2, id.vars = "price", na.rm = TRUE)
# calculate summary stats by material (= 'value' variable)
df4 <- aggregate(price ~ value, data = df3, summary)
# value price.Min. price.1st Qu. price.Median price.Mean price.3rd Qu. price.Max.
# 1 brass 26.000 26.000 26.000 26.000 26.000 26.000
# 2 cotton 10.000 10.500 11.000 11.000 11.500 12.000
# 3 cube 35.000 35.000 35.000 35.000 35.000 35.000
# 4 linen 1.350 6.675 12.000 10.120 14.500 17.000
# 5 metal 35.000 35.000 35.000 35.000 35.000 35.000
# 6 Metal 12.000 12.000 12.000 12.000 12.000 12.000
# 7 nickel 12.000 12.000 12.000 12.000 12.000 12.000
# 8 silk 10.000 10.000 10.000 10.000 10.000 10.000
# 9 string 35.000 35.000 35.000 35.000 35.000 35.000
# 10 wax 35.000 35.000 35.000 35.000 35.000 35.000
# 11 wood 10.000 10.000 10.000 10.000 10.000 10.000
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.