简体   繁体   English


[英]How to use two columns to access a specific element in a data frame?

I'm trying to use two columns to access a table and then output it to a third column. 我正在尝试使用两列来访问表,然后将其输出到第三列。 This is the function I wrote to access it: 这是我编写的用于访问它的函数:

getami <- function(bedroom, year){
  ami <- hhsplus[bedroom + 1, year - 1997]

This is how I am calling the function 这就是我调用该函数的方式

df$ami <- getami(df$beds, df$year)

beds and years are just a list of two integers 床和年只是两个整数的列表

Here's an excerpt of what hhsplus looks like: 以下是hhsplus的摘要:

    1998    1999    2000    2001    2002
1   54050   57800   60900   61100   67200
2   61750   66100   69600   69850   76800
3   69500   74350   78300   78550   86400
4   77200   82600   87000   87300   96000
5   83400   89200   93950   94300   103700
6   89550   95800   100900  101250  111350
7   95750   102400  107900  108250  119050
8   101900  109050  114850  115250  126700

When I store it into df$ami, it is appearing in descending order. 当我将其存储到df $ ami中时,它以降序出现。 I am wondering how I can get ami stored depending on the two columns 我想知道如何根据两列存储ami

Edit: This is what df$beds and df$year (actually df$dc) looks like 编辑: 这是df $ beds和df $ year(实际上是df $ dc)的样子

Edit 2: Here's an excerpt of df in CSV format: 编辑2:这是CSV格式的df的摘录:

"","Date Listed","Price Listed","Date Closed","Price Closed","Days on Market","Age","Price/SF","SF","Beds","Baths","dc","ami"

Edit 3: dput(head(df,10)) 编辑3:dput(head(df,10))

structure(list(`Date Listed` = structure(c(1369872000, 1400112000, 
1394755200, 1459123200, 1274745600, 1384473600, 1430784000, 1392940800, 
1376265600, 1253059200), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    `Price Listed` = c(1538000, 2799000, 1199888, 3195000, 2350000, 
    2295000, 1550000, 2595000, 3750000, 2750000), `Date Closed` = structure(c(1375920000, 
    1412726400, 1411084800, 1475798400, 1301616000, 1418947200, 
    1438646400, 1398211200, 1436918400, 1260403200), class = c("POSIXct", 
    "POSIXt"), tzone = "UTC"), `Price Closed` = c(1480000, 2300000, 
    1200000, 2800000, 1925000, 2183000, 1550000, 2520000, 2640000, 
    2525000), `Days on Market` = c(18, 124, 145, 112, 245, 285, 
    57, 37, 548, 527), Age = c(0, 3, 9, 14, 33, 8, 11, 11, 12, 
    9), `Price/SF` = c(332, 265, 215, 427, 241, 299, 310, 329, 
    376, 334), SF = c(4460, 8691, 5586, 6562, 8000, 7300, 4993, 
    7651, 7030, 7550), Beds = c(7, 7, 7, 7, 6, 6, 6, 6, 6, 6), 
    Baths = c(6, 8, 6, 6, 12, 8, 6, 7, 5, 6), dc = c(2013, 2014, 
    2014, 2016, 2011, 2014, 2015, 2014, 2015, 2009), ami = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA)), .Names = c("Date Listed", 
"Price Listed", "Date Closed", "Price Closed", "Days on Market", 
"Age", "Price/SF", "SF", "Beds", "Baths", "dc", "ami"), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

If you're just trying to turn your data into a flat file, you can use gather from the tidyr package: 如果你只是试图把你的数据转换为平面文件,你可以使用gathertidyr包:

df = read.table(text=" bedroom   1998    1999    2000    2001    2002
                1   54050   57800   60900   61100   67200
                2   61750   66100   69600   69850   76800
                3   69500   74350   78300   78550   86400
                4   77200   82600   87000   87300   96000
                5   83400   89200   93950   94300   103700
                6   89550   95800   100900  101250  111350
                7   95750   102400  107900  108250  119050
                8   101900  109050  114850  115250  126700", header = TRUE)
answer = gather(data = df, key = "year", value = "hhsplus", X1998:X2002) 

Note that the way I've creatd the dataset from your sample data, all the year columns now have "X" in the front. 请注意,按照我从示例数据创建数据集的方式,所有年份列的前面都带有“ X”。 Here's how you fix it: 解决方法如下:

answer$year = as.numeric(gsub("X", "", answer$year))

Result: 结果:

    bedroom year hhsplus
    1       1998   54050
    2       1998   61750
    3       1998   69500
    4       1998   77200
    5       1998   83400
    6       1998   89550
    7       1998   95750
    8       1998  101900
    1       1999   57800

I would solve this problem by merging the two data frames. 我将通过合并两个数据帧来解决此问题。 You can do this by converting hhsplus to a long format. 您可以通过将hhsplus转换为长格式来实现。 See code below. 请参见下面的代码。

But, I'm not quite clear on how exactly you want to merge the two data frames. 但是,我不清楚您要如何精确地合并两个数据框。 In your function, you have hhsplus[bedroom + 1, year - 1997] , why do you add 1 to bedroom, and subtract 1997 from year? 在您的函数中,您有hhsplus[bedroom + 1, year - 1997] ,为什么将1加到卧室,并从年份中减去1997?


# From lebelinoz's answer, read in hhsplus:
hhsplus = read.table(text=" bedroom   1998    1999    2000    2001    2002
                     1   54050   57800   60900   61100   67200
                     2   61750   66100   69600   69850   76800
                     3   69500   74350   78300   78550   86400
                     4   77200   82600   87000   87300   96000
                     5   83400   89200   93950   94300   103700
                     6   89550   95800   100900  101250  111350
                     7   95750   102400  107900  108250  119050
                     8   101900  109050  114850  115250  126700", header = TRUE)

# convert hhsplus to long format:
ncols = ncol(hhsplus)
hhsplus_long = gather(data = hhsplus, year, hhsplus_ami, -1)
hhsplus_long$year = gsub("X", "", hhsplus_long$year)
hhsplus_long$bedroom = hhsplus_long$bedroom - 1

# merge two data frames, keeping all records from df (all.x=TRUE)
merge(df, hhsplus_long, by.x = c("Beds", "dc"), by.y=c("bedroom", "year"), all.x=TRUE)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM