简体   繁体   English

我可以将tvm库中的XIRR函数应用于表中的每一行,该行中已有现金流吗?

[英]Can I apply the XIRR function from tvm library for each row in my table, where the cash flows already on that row?

This is my first question, so I apologize in advance if it's not a perfectly asked question. 这是我的第一个问题,因此,如果不是一个完美的问题,我谨向您道歉。 I already searched all over Stack Overflow (& Google), but was unable to find what I am looking for. 我已经在Stack Overflow(&Google)上进行了搜索,但是找不到我想要的东西。 Also, I'm new to R and am learning it on my own as I go. 另外,我是R的新手,并且我自己会自己学习。

My issues is this: I am trying to compute the internal rate of return for each row in my table using the XIRR function from tvm. 我的问题是这样的:我正在尝试使用tvm的XIRR函数计算表中每一行的内部收益率。 I was able to get XIRR to work for a single cash flow stream. 我能够使XIRR能够为单一现金流量工作。 Here is an example of what I was able to get to work. 这是我上班的例子。

# This is a sample that works
install.packages("tvm")
library(tvm)

x_CF <- c(-7500, 3000, 5000, 1200, 4000)
x_d <- as.Date(c("2016-01-01", "2016-02-01", "2016-04-15", "2016-08-01", "2017-03-26"))
xirr <- xirr(x_CF, x_d)

In my specific scenario, I have a table with the periodic cash flows and dates populated on each row for each ID. 在我的特定情况下,我有一个表格,其中每个ID的每一行都填充有定期现金流量和日期。 The cash flows are in columns cf1, cf2, cf3, cf(n)... and the dates are in columns date1, date2, date3, date(n)... The number of cash flows and dates are currently 14 (n=14), but could be something different (ie 36, 60, etc). 现金流量在cf1,cf2,cf3,cf(n)列中,日期在date1,date2,date3,date(n)列中。现金流量和日期数当前为14(n = 14),但可能有所不同(例如36、60等)。 This is a code that populates 2 rows from my much larger table. 这是从我的大表中填充2行的代码。

# This is just 2 rows of my data table where I manually write the values (the real table is much larger and is dynamically created with code)    

sample_data <-
    matrix(
        c(
            "A",
            "2016-01-31", "2016-02-29", "2016-03-31","2016-04-30","2016-05-31",
            1000, 10, 20, -50, -1025,
            "B",
            "2016-01-31", "2016-02-29", "2016-03-31","2016-04-30", "2016-05-31",
            1000, -50, 20, 10, -1025),
        ncol = 11, byrow = TRUE)

colnames(sample_data) <-
    c("SecId",
      "date1", "date2", "date3", "date4", "date5",
      "cf1", "cf2", "cf3", "cf4", "cf5")

sample_data <- tbl_df(sample_data)

sample_data <-
    sample_data %>% mutate_at(vars(starts_with("cf")),
                              funs(as.integer))
sample_data <-
    sample_data %>% mutate_at(vars(starts_with("date")),
                              funs(as.Date))

I would like to use the XIRR function to read cf1:n and date1:n. 我想使用XIRR函数读取cf1:n和date1:​​n。 The result should be another column (XIRR) inserted and the computed values to be A = 0.1412532 and B = 0.1458380. 结果应在另一列(XIRR)中插入,计算值应为A = 0.1412532和B = 0.1458380。

Is this possible, or should I be looking into some other function? 这可能吗,还是我应该研究其他功能? Thanks! 谢谢!

EDIT - Additional details and response to why "peer's" answer didn't work 编辑-其他详细信息以及对“同行”答案为何无效的答复

My actual data has the cash flows and dates in a long table format with over 5.5 million rows. 我的实际数据是以长表格式包含超过550万行的现金流量和日期。 The reason I converted them to the "deprecated" table is because what I'm ultimately trying to do is create a rolling monthly IRR calculation. 之所以将它们转换为“已弃用”表,是因为我最终想要做的是创建滚动的每月IRR计算。 I figured if I built the Date and Cash flow streams on each line, then I could avoid doing a loop apply XIRR directy to each line. 我想出了如果我在每一行上建立了日期和现金流,那么我就可以避免直接将XIRR直接应用于每一行。 Creating the long table which includes every iteration of ID/Date would not be realistic for this amount of data (I don't think). 创建包含ID / Date的每个迭代的长表对于这种数量的数据是不现实的(我不认为)。

With the proposed code, the cash flows and dates are merged for the same ID's, so it doesn't account for rolling periods. 使用建议的代码,现金流和日期合并为相同的ID,因此不考虑滚动期间。 I know this wasn't explained in my original question. 我知道我的原始问题并未对此进行解释。

In addition, I have periods with missing cash flows which show NA (since they're mutated as.numeric). 此外,我的现金流缺失的期间显示为NA(因为它们被突变为数字)。 I need XIRR to handle this by not performing a calculation when there are any NA's. 我需要XIRR通过在有任何NA时不执行计算来处理此问题。 I think this can be handled with is.na = TRUE in the summarise command. 我认为可以在summary命令中使用is.na = TRUE进行处理。

EDIT #2: Found a partial solution 编辑2:找到了部分解决方案

After playing around with this I was able to get the XIRR function to work for the sample data from above. 解决了这个问题之后,我能够从上面获取XIRR函数以处理示例数据。 Here is the code that works, but takes a very long time with my actual data. 这是有效的代码,但是花费我的实际数据很长时间。

calc_xirr <- sample_data %>% rowwise() %>%
do(data.frame(., xirr = tryCatch(xirr(unlist(.[7:11]), unlist(.[2:6]),lower=0,upper=1),
                                 error = function(e) {NA}))) %>%
select(SecId, xirr)

I get a warning message "Warning message: In bind_rows_(x, .id) : Unequal factor levels: coercing to character", but the calculation is accurate. 我收到警告消息“警告消息:bind_rows_(x,.id):不相等的因子水平:强制转换为字符”,但计算准确。

The issue I still have with this is how slow this is for my actual data set. 我仍然遇到的问题是,这对于我的实际数据集来说有多慢。 It runs for a very long time (6+ hours), but does produce correct results. 它可以运行很长时间(超过6小时),但确实会产生正确的结果。 Is there any way to rewrite this using parallel processing, or without rowwise, which I'm assuming is a loop operation and is slow. 有没有什么办法可以使用并行处理来重写它,也可以不逐行地重写它,我认为这是一个循环操作并且很慢。

First of all, tbl_df seems to be deprecated, use as_tibble or as.tibble instead. 首先, tbl_df似乎被弃用,请as_tibbleas.tibble代替。

I also changed your sample data, since i am getting an error when applying the data from ID "A". 我还更改了您的示例数据,因为从ID“ A”应用数据时遇到错误。 I defined the sample data as follows. 我将样本数据定义如下。

sample_data <-
  matrix(
    c(
      "A",
      "2016-01-01",
      "2016-02-01",
      "2016-04-15",
      "2016-08-01",
      "2017-03-26",
      -7500,
      3000,
      5000,
      1200,
      4000,
      "B",
      "2016-01-01",
      "2016-02-01",
      "2016-04-15",
      "2016-08-01",
      "2017-03-26",
      -7500,
      3000,
      5000,
      1200,
      4000
    ),
    ncol = 11,
    byrow = TRUE
  )

colnames(sample_data) <-
  c("ID",
    "date1",
    "date2",
    "date3",
    "date4",
    "date5",
    "cf1",
    "cf2",
    "cf3",
    "cf4",
    "cf5")

I split my code in two parts. 我将代码分为两部分。 The first part is to tidy the data, the second is for creating the desired value. 第一部分是整理数据,第二部分是创建所需的值。

sample_data <- tbl_df(sample_data)

sample_data <-
  sample_data %>% mutate_at(vars(starts_with("cf")),
                            funs(as.numeric),
                            vars(starts_with("date")),
                            funs(as.Date))
sample_data_dates <-
  sample_data %>% select(starts_with("date"), ID) %>% gather(key, date, -ID) %>% mutate(index = gsub("date", "", key))
sample_data_cashflows <-
  sample_data %>% select(starts_with("cf"), ID) %>% gather(key, cashflow,-ID) %>% mutate(index = gsub("cf", "", key))

sample_data <-
  inner_join(
    sample_data_dates %>% select(-key),
    sample_data_cashflows %>% select(-key),
    by = c("ID", "index")
  ) %>% select(-index)

After this, you have a table with the column names ID, date and cashflow. 之后,您将获得一个具有列名称ID,日期和现金流量的表。 Then, you can simply calculate the value as a result from the function xirr by the following code: 然后,您可以通过以下代码简单地从函数xirr计算结果值:

sample_data %>% group_by(ID) %>% summarise(xirr(cashflow,as.Date(date)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM