源代碼中的 uniroot() 函數在修改后不起作用；無法弄清楚錯誤

Question

我試圖找出 R 中兩條曲線交點的坐標。輸入數據是兩條曲線中經驗點的坐標。 我的解決方案是使用函數curve_intersect() 。 我需要為 2000 次重復（即 2000 對曲線）執行此操作。 所以我把數據放在兩個列表中。 每個列表包含 1000 個數據框，每個數據框中的一條曲線的 x 和 y 坐標。

這是我的數據：數據

下面是我使用的代碼。

threshold_or1 <- map2_df(recall_or1_4, precision_or1_4,
                         ~curve_intersect(.x, .y, empirical = TRUE, domain = NULL))

# recall_or_4 is a list of 2000 data frames. Each data frame 
# |contains coordinates from curve #1. 

# precision_or_4 is a list of 2000 data frames. Each data frame 
# |contains coordinates from curve #2.

我在下面收到此錯誤消息。

Error in uniroot(function(x) curve1_f(x) - curve2_f(x), c(min(curve1$x),  : f() values at end points not of opposite sign

由於函數 curve_intersect() 可以成功應用於兩個列表中的某些單獨數據幀。 我運行了以下代碼以准確查看是哪對數據幀導致該過程失敗。

test <- for (i in 1:2000){
            curve_intersect(recall_or1_4[[i]], precision_or1_4[[i]], empirical = TRUE, domain = NULL)
            print(paste("i=",i))}

然后，我收到以下消息，這意味着該進程成功運行，直到它到達數據對 #460。 所以我檢查了那個單獨的數據對。

[1] "i= 457"
[1] "i= 458"
[1] "i= 459"
Error in uniroot(function(x) curve1_f(x) - curve2_f(x), c(min(curve1$x),  : f() values at end points not of opposite sign

我繪制了數據對 #460。

test1 <- precision_or1_4[[460]] %>% mutate(statistics = 'precision')
test2 <- recall_or1_4[[460]] %>% mutate(statistics = 'recall')
test3 <- rbind(test1, test2)
test3 <- test3 %>% mutate(statistics = as.factor(statistics))
curve_test3 <- ggplot(test3, aes(x = x, y = y))+
        geom_line(aes(colour = statistics))
curve_test3

找到交點的坐標

然后我去修改了curve_intersect()的源碼。 原始源代碼是

    curve_intersect <- function(curve1, curve2, empirical=TRUE, domain=NULL) {
        if (!empirical & missing(domain)) {
                stop("'domain' must be provided with non-empirical curves")
        }
        
        if (!empirical & (length(domain) != 2 | !is.numeric(domain))) {
                stop("'domain' must be a two-value numeric vector, like c(0, 10)")
        }
        
        if (empirical) {
                # Approximate the functional form of both curves
                curve1_f <- approxfun(curve1$x, curve1$y, rule = 2)
                curve2_f <- approxfun(curve2$x, curve2$y, rule = 2)
                
                # Calculate the intersection of curve 1 and curve 2 along the x-axis
                point_x <- uniroot(function(x) curve1_f(x) - curve2_f(x),
                                   c(min(curve1$x), max(curve1$x)))$root
                
                # Find where point_x is in curve 2
                point_y <- curve2_f(point_x)
        } else {
                # Calculate the intersection of curve 1 and curve 2 along the x-axis
                # within the given domain
                point_x <- uniroot(function(x) curve1(x) - curve2(x), domain)$root
                
                # Find where point_x is in curve 2
                point_y <- curve2(point_x)
        }
        
        return(list(x = point_x, y = point_y))
}

我修改了第三個 if 語句中的uniroot()部分。 我沒有使用c(min(curve1$x), max(curve1$x))作為uniroot()的參數， uniroot()使用了lower = -100000000, upper = 100000000 。 修改后的函數是

curve_intersect_tq <- function(curve1, curve2, empirical=TRUE, domain=NULL) {
        if (!empirical & missing(domain)) {
                stop("'domain' must be provided with non-empirical curves")
        }
        
        if (!empirical & (length(domain) != 2 | !is.numeric(domain))) {
                stop("'domain' must be a two-value numeric vector, like c(0, 10)")
        }
        
        if (empirical) {
                # Approximate the functional form of both curves
                curve1_f <- approxfun(curve1$x, curve1$y, rule = 2)
                curve2_f <- approxfun(curve2$x, curve2$y, rule = 2)
                
                # Calculate the intersection of curve 1 and curve 2 along the x-axis
                point_x <- uniroot(function(x) curve1_f(x) - curve2_f(x),
                                   lower = -100000000, upper = 100000000)$root
                
                # Find where point_x is in curve 2
                point_y <- curve2_f(point_x)
        } else {
                # Calculate the intersection of curve 1 and curve 2 along the x-axis
                # within the given domain
                point_x <- uniroot(function(x) curve1(x) - curve2(x), domain)$root
                
                # Find where point_x is in curve 2
                point_y <- curve2(point_x)
        }
        
        return(list(x = point_x, y = point_y))
}

我試圖更改lower =, upper =參數的值。 這沒用。 我收到了如下所示的相同錯誤消息。

curve_intersect_tq(recall_or1_4[[460]], precision_or1_4[[460]], empirical = TRUE, domain = NULL)

Error in uniroot(function(x) curve1_f(x) - curve2_f(x), c(min(curve1$x),  : 
  f() values at end points not of opposite sign

我還嘗試使用 tidyverse 包中的 possible possibly(fun, NA) ，希望該過程即使出現錯誤消息也能運行。 我用的時候沒用

(1) possibly(curve_intersect(), NA)或 (2) possibly(uniroot(), NA)

出現了相同的錯誤消息。

為什么我有錯誤信息？ 可能的解決方案是什么？ 提前致謝。

Answer 1

參加聚會可能有點晚了，但這就是您的代碼仍然失敗的原因以及您可以做什么，這取決於您想從分析中得到什么：

首先，即使經過改編，您的代碼仍然失敗的原因是您只是告訴uniroot在x搜索更寬的窗口。 然而，底層曲線永遠不會相交- 只是找不到任何curve1_f(x) - curve2_f(x) == 0 。

來自uniroot的文檔：

“端點處的函數值必須是相反的符號（或零），因為extendInt="no"，默認值。”

在最初的curve_intersect實現中， uniroot正在搜索數據中定義的 x 間隔（即c(min(curve1$x), max(curve1$x)) ）。 在您的更改中，您告訴它在 x 間隔[-100000000, 100000000] 。 你也可以設置extendInt = "yes" ，但它不會改變任何東西。
問題不在於搜索間隔，而在於approxfun ！

approxfun只是通過在點之間插入經驗數據來幫助您。 在您傳入的數據之外，返回的函數不知道該怎么做。
approxfun允許您為y指定顯式值，該值應該在經驗定義的窗口之外返回（使用其參數yleft / yright ），或者讓您為每一側設置rule 。
在您上面發布的代碼中， rule = 2決定“使用最接近數據極值的值”。 因此， approxfun不會推斷您傳入的數據。它只會擴展已知的數據。

我們可以繪制如何curve1_f和curve2_f將延長經驗定義的x區間為無窮大之外：

tibble(
    x = seq(0, 1, by = 0.001),
    curve1_approxed = curve1_f(x),
    curve2_approxed = curve2_f(x)
  ) %>%
  pivot_longer(starts_with("curve"), names_to = "curve", values_to = "y") %>%
  ggplot(aes(x = x, y = y, color = curve)) +
  geom_line() +
  geom_vline(xintercept = c(min(curve1$x), max(curve1$x)), color = "grey75")

所以，現在你可以做些什么來讓你的代碼不崩潰：
（劇透：這在很大程度上取決於您要通過項目完成什么）

接受觀察到的數據限制沒有交集。
如果您不想做出任何假設，我建議您將映射函數包裝在tryCatch語句中，並讓它在開箱即用的解決方案沒有給您任何結果的情況下失敗。 讓我們為之前使整個事情崩潰的列表部分運行它：

threshold_or1.fix1 <- map2_df(
  recall_or1_4, precision_or1_4,
  ~tryCatch({
    curve_intersect(.x, .y, empirical = TRUE, domain = NULL)
  }, error = function(e){
    return(tibble(.rows = 1))
  }),
  .id = "i"
)

現在，當curve_intersect無法為您提供結果時，只有一個 NA 行。

threshold_or1.fix1[459:461,]
# A tibble: 3 x 3
  i          x      y
  <chr>  <dbl>  <dbl>
1 459    0.116  0.809
2 460   NA     NA    
3 461    0.264  0.773

嘗試使用線性模型推斷您的數據
在這種情況下，我們將使用自定義curve_intersect函數。 讓我們將有問題的uniroot調用包裝在tryCatch ，如果找不到根，我們將為每條曲線擬合一個lm並讓uniroot在擬合的uniroot找到一個交點。
根據你的實驗，這可能有意義也可能沒有意義，所以我會讓你在這里做判斷。 顯然，如果您的數據比簡單的lm更復雜，您可以使用其他模型而不是簡單的lm ......
只是為了可視化這種方法與默認方法：

tibble(
    x = seq(-1, 2, by = 0.001),
    curve1_approxed = curve1_f(x),
    curve2_approxed = curve2_f(x),
    curve1_lm = predict(lm(y ~ x, data = curve1), newdata = tibble(x = x)),
    curve2_lm = predict(lm(y ~ x, data = curve2), newdata = tibble(x = x))
  ) %>%
  pivot_longer(starts_with("curve"), names_to = "curve", values_to = "y") %>%
  ggplot(aes(x = x, y = y, color = curve)) +
  geom_line() +
  geom_vline(xintercept = c(min(curve1$x), max(curve1$x)), color = "grey75")

你看，在approxfun “失敗”的地方，使用lm我們假設我們可以線性外推並在觀察到的框架之外找到x = 1.27周圍的交點。

要采用第二種方法並在我們的搜索中使用lm進行外推，您可以將以下內容放在一起：
（這里也只有第三個if被編輯。）

curve_intersect_custom <- function(curve1, curve2, empirical=TRUE, domain=NULL) {
  if (!empirical & missing(domain)) {
    stop("'domain' must be provided with non-empirical curves")
  }
  
  if (!empirical & (length(domain) != 2 | !is.numeric(domain))) {
    stop("'domain' must be a two-value numeric vector, like c(0, 10)")
  }
  
  if (empirical) {
    
    return(
      tryCatch({
        # Approximate the functional form of both curves
        curve1_f <- approxfun(curve1$x, curve1$y, rule = 2)
        curve2_f <- approxfun(curve2$x, curve2$y, rule = 2)
        
        # Calculate the intersection of curve 1 and curve 2 along the x-axis
        point_x <- uniroot(
          f = function(x) curve1_f(x) - curve2_f(x),
          interval = c(min(curve1$x), max(curve1$x))
        )$root
        
        # Find where point_x is in curve 2
        point_y <- curve2_f(point_x)
        
        return(list(x = point_x, y = point_y, method = "approxfun"))
        
      }, error = function(e) {
        tryCatch({
          curve1_lm_f <- function(x) predict(lm(y ~ x, data = curve1), newdata = tibble(x = x))
          curve2_lm_f <- function(x) predict(lm(y ~ x, data = curve2), newdata = tibble(x = x))
          
          point_x <- uniroot(
            f = function(x) curve1_lm_f(x) - curve2_lm_f(x),
            interval = c(min(curve1$x), max(curve1$x)),
            extendInt = "yes"
          )$root
          
          point_y <- curve2_lm_f(point_x)
          
          return(list(x = point_x, y = point_y, method = "lm"))
          
        }, error = function(e) {
          return(list(x = NA_real_, y = NA_real_, method = NA_character_))
        })
      })
    )
    
    
  } else {
    # Calculate the intersection of curve 1 and curve 2 along the x-axis
    # within the given domain
    point_x <- uniroot(function(x) curve1(x) - curve2(x), domain)$root
    
    # Find where point_x is in curve 2
    point_y <- curve2(point_x)
  }
  
  return(list(x = point_x, y = point_y))
}

對於有問題的列表元素，現在嘗試使用天真的擬合lm模型進行推斷：

threshold_or1.fix2 <- map2_df(
    recall_or1_4, precision_or1_4,
    ~curve_intersect_custom(.x, .y, empirical = TRUE, domain = NULL),
    .id = "i"
)

threshold_or1.fix2[459:461,]
# A tibble: 3 x 4
  i         x     y method   
  <chr> <dbl> <dbl> <chr>    
1 459   0.116 0.809 approxfun
2 460   1.27  0.813 lm       
3 461   0.264 0.773 approxfun

希望這有助於理解和解決您的問題:)

源代碼中的 uniroot() 函數在修改后不起作用；無法弄清楚錯誤

問題描述

1 個解決方案

解決方案1
0 已采納 2020-09-19 20:40:40

源代碼中的 uniroot() 函數在修改后不起作用； 無法弄清楚錯誤

問題描述

1 個解決方案

解決方案1 0 已采納 2020-09-19 20:40:40

源代碼中的 uniroot() 函數在修改后不起作用；無法弄清楚錯誤

解決方案1
0 已采納 2020-09-19 20:40:40