[英]How to use two columns to access a specific element in a data frame?
I'm trying to use two columns to access a table and then output it to a third column. 我正在尝试使用两列来访问表,然后将其输出到第三列。 This is the function I wrote to access it:
这是我编写的用于访问它的函数:
getami <- function(bedroom, year){
ami <- hhsplus[bedroom + 1, year - 1997]
return(ami)
}
This is how I am calling the function 这就是我调用该函数的方式
df$ami <- getami(df$beds, df$year)
beds and years are just a list of two integers 床和年只是两个整数的列表
Here's an excerpt of what hhsplus looks like: 以下是hhsplus的摘要:
1998 1999 2000 2001 2002
-------------------------------------------
1 54050 57800 60900 61100 67200
2 61750 66100 69600 69850 76800
3 69500 74350 78300 78550 86400
4 77200 82600 87000 87300 96000
5 83400 89200 93950 94300 103700
6 89550 95800 100900 101250 111350
7 95750 102400 107900 108250 119050
8 101900 109050 114850 115250 126700
When I store it into df$ami, it is appearing in descending order. 当我将其存储到df $ ami中时,它以降序出现。 I am wondering how I can get ami stored depending on the two columns
我想知道如何根据两列存储ami
Edit: This is what df$beds and df$year (actually df$dc) looks like 编辑: 这是df $ beds和df $ year(实际上是df $ dc)的样子
Edit 2: Here's an excerpt of df in CSV format: 编辑2:这是CSV格式的df的摘录:
"","Date Listed","Price Listed","Date Closed","Price Closed","Days on Market","Age","Price/SF","SF","Beds","Baths","dc","ami"
"1",2013-05-30,1538000,2013-08-08,1480000,18,0,332,4460,7,6,2013,NA
"2",2014-05-15,2799000,2014-10-08,2300000,124,3,265,8691,7,8,2014,NA
"3",2014-03-14,1199888,2014-09-19,1200000,145,9,215,5586,7,6,2014,NA
"4",2016-03-28,3195000,2016-10-07,2800000,112,14,427,6562,7,6,2016,NA
"5",2010-05-25,2350000,2011-04-01,1925000,245,33,241,8000,6,12,2011,NA
"6",2013-11-15,2295000,2014-12-19,2183000,285,8,299,7300,6,8,2014,NA
"7",2015-05-05,1550000,2015-08-04,1550000,57,11,310,4993,6,6,2015,NA
"8",2014-02-21,2595000,2014-04-23,2520000,37,11,329,7651,6,7,2014,NA
"9",2013-08-12,3750000,2015-07-15,2640000,548,12,376,7030,6,5,2015,NA
"10",2009-09-16,2750000,2009-12-10,2525000,527,9,334,7550,6,6,2009,NA
"11",2013-05-27,1299000,2014-02-07,1350000,201,21,320,4217,6,5,2014,NA
"12",2015-02-07,2299000,2015-06-23,2240000,10,28,288,7783,6,8,2015,NA
"13",2014-05-16,1760000,2015-06-02,1700000,311,28,256,6650,6,5,2015,NA
"14",2012-02-24,749950,2012-04-27,740000,29,32,183,4045,6,3,2012,NA
"15",2013-01-25,1650000,2013-03-25,1600000,11,28,511,3133,6,6,2013,NA
"16",2014-02-16,1198000,2014-04-16,1150000,11,36,388,2964,6,5,2014,NA
"17",2014-04-04,1349950,2014-08-11,1340000,59,36,273,4904,6,4,2014,NA
"18",2017-06-04,1425000,2017-06-05,1425000,1,40,450,3166,6,4,2017,NA
"19",2009-05-08,1850000,2009-12-01,1500000,188,32,250,6000,6,4,2009,NA
"20",2014-03-14,1650000,2015-03-17,1480000,335,37,318,4660,6,4,2015,NA
"21",2013-06-12,2348000,2013-10-24,2025000,300,11,397,5100,6,5,2013,NA
"22",2016-01-25,1249000,2016-02-29,1125000,14,44,403,2792,6,4,2016,NA
"23",2011-08-22,580000,2011-11-08,575000,241,40,158,3636,6,5,2011,NA
"24",2011-07-25,599000,2011-09-14,570000,4,52,221,2576,6,4,2011,NA
"25",2010-06-26,1349000,2010-09-30,1300000,56,72,260,5000,6,4,2010,NA
"26",2016-09-09,1399000,2016-11-16,1410000,4,12,357,3948,6,5,2016,NA
Edit 3: dput(head(df,10)) 编辑3:dput(head(df,10))
structure(list(`Date Listed` = structure(c(1369872000, 1400112000,
1394755200, 1459123200, 1274745600, 1384473600, 1430784000, 1392940800,
1376265600, 1253059200), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
`Price Listed` = c(1538000, 2799000, 1199888, 3195000, 2350000,
2295000, 1550000, 2595000, 3750000, 2750000), `Date Closed` = structure(c(1375920000,
1412726400, 1411084800, 1475798400, 1301616000, 1418947200,
1438646400, 1398211200, 1436918400, 1260403200), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), `Price Closed` = c(1480000, 2300000,
1200000, 2800000, 1925000, 2183000, 1550000, 2520000, 2640000,
2525000), `Days on Market` = c(18, 124, 145, 112, 245, 285,
57, 37, 548, 527), Age = c(0, 3, 9, 14, 33, 8, 11, 11, 12,
9), `Price/SF` = c(332, 265, 215, 427, 241, 299, 310, 329,
376, 334), SF = c(4460, 8691, 5586, 6562, 8000, 7300, 4993,
7651, 7030, 7550), Beds = c(7, 7, 7, 7, 6, 6, 6, 6, 6, 6),
Baths = c(6, 8, 6, 6, 12, 8, 6, 7, 5, 6), dc = c(2013, 2014,
2014, 2016, 2011, 2014, 2015, 2014, 2015, 2009), ami = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA)), .Names = c("Date Listed",
"Price Listed", "Date Closed", "Price Closed", "Days on Market",
"Age", "Price/SF", "SF", "Beds", "Baths", "dc", "ami"), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
If you're just trying to turn your data into a flat file, you can use gather
from the tidyr
package: 如果你只是试图把你的数据转换为平面文件,你可以使用
gather
从tidyr
包:
library(tidyr)
df = read.table(text=" bedroom 1998 1999 2000 2001 2002
1 54050 57800 60900 61100 67200
2 61750 66100 69600 69850 76800
3 69500 74350 78300 78550 86400
4 77200 82600 87000 87300 96000
5 83400 89200 93950 94300 103700
6 89550 95800 100900 101250 111350
7 95750 102400 107900 108250 119050
8 101900 109050 114850 115250 126700", header = TRUE)
answer = gather(data = df, key = "year", value = "hhsplus", X1998:X2002)
Note that the way I've creatd the dataset from your sample data, all the year columns now have "X" in the front. 请注意,按照我从示例数据创建数据集的方式,所有年份列的前面都带有“ X”。 Here's how you fix it:
解决方法如下:
answer$year = as.numeric(gsub("X", "", answer$year))
Result: 结果:
bedroom year hhsplus
1 1998 54050
2 1998 61750
3 1998 69500
4 1998 77200
5 1998 83400
6 1998 89550
7 1998 95750
8 1998 101900
1 1999 57800
...
I would solve this problem by merging the two data frames. 我将通过合并两个数据帧来解决此问题。 You can do this by converting
hhsplus
to a long format. 您可以通过将
hhsplus
转换为长格式来实现。 See code below. 请参见下面的代码。
But, I'm not quite clear on how exactly you want to merge the two data frames. 但是,我不清楚您要如何精确地合并两个数据框。 In your function, you have
hhsplus[bedroom + 1, year - 1997]
, why do you add 1 to bedroom, and subtract 1997 from year? 在您的函数中,您有
hhsplus[bedroom + 1, year - 1997]
,为什么将1加到卧室,并从年份中减去1997?
require("tidyr")
# From lebelinoz's answer, read in hhsplus:
hhsplus = read.table(text=" bedroom 1998 1999 2000 2001 2002
1 54050 57800 60900 61100 67200
2 61750 66100 69600 69850 76800
3 69500 74350 78300 78550 86400
4 77200 82600 87000 87300 96000
5 83400 89200 93950 94300 103700
6 89550 95800 100900 101250 111350
7 95750 102400 107900 108250 119050
8 101900 109050 114850 115250 126700", header = TRUE)
# convert hhsplus to long format:
ncols = ncol(hhsplus)
hhsplus_long = gather(data = hhsplus, year, hhsplus_ami, -1)
hhsplus_long$year = gsub("X", "", hhsplus_long$year)
hhsplus_long$bedroom = hhsplus_long$bedroom - 1
# merge two data frames, keeping all records from df (all.x=TRUE)
merge(df, hhsplus_long, by.x = c("Beds", "dc"), by.y=c("bedroom", "year"), all.x=TRUE)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.