提取/導出R（ecdf）中的經驗累積分布函數的數據

Question

我用R計算一些數據的ecdf。 我想在其他軟件中使用結果。 我用R只是為了做“工作”，而不是為論文得出最終的圖表。

范例程式碼

# Plotting the a built in sampla data
plot(cars$speed)
# Assingning the data to a new variable name
myData = cars$speed
# Calculating the edcf
myResult = ecdf(myData)
myResult
# Plotting the ecdf
plot(myResult)

輸出量

> # Plotting the a built in sampla data
> plot(cars$speed)
> # Assingning the data to a new variable name
> myData = cars$speed
> # Calculating the edcf
> myResult = ecdf(myData)
> myResult
Empirical CDF 
Call: ecdf(myData)
 x[1:19] =      4,      7,      8,  ...,     24,     25
> # Plotting the ecdf
> plot(myResult)
> plot(cars$speed)

在此處輸入圖片說明

問題

問題1

如何獲得原始信息以便在其他軟件（例如Excel，Matlab，LaTeX）中繪制ecdf圖？ 對於直方圖功能，我可以寫

res = hist(...)

我發現所有類似的信息

res$breaks
res$counts
res$density
res$mids
res$xname

問題2

如何計算逆ecdf？ 假設我想知道有多少輛車的速度低於10 mph（示例數據是車速）。

更新資料

多虧了user777的回答，我現在有了更多信息。 如果我用

> myResult(0:25)
 [1] 0.00 0.00 0.00 0.00 0.04 0.04 0.04 0.08 0.10 0.12 0.18 0.22 0.30 0.38
[15] 0.46 0.52 0.56 0.62 0.70 0.76 0.86 0.86 0.88 0.90 0.98 1.00

我得到了0到25英里/小時的數據。 但是我不知道在哪里繪制數據點。 我確實想精確再現R圖。

在這里，我每1英里每小時就有一個數據點。

在此處輸入圖片說明

在這里，我沒有每1英里每小時的數據品脫。 如果有可用數據，我只有一個數據點。

在此處輸入圖片說明

解

# Plotting the a built in sample data
plot(cars$speed)
# Assingning the data to a new variable name
myData = cars$speed
# Calculating the edcf
myResult = ecdf(myData)
myResult
# Plotting the ecdf
plot(myResult)
# Have a look on the probability for 0 to 25 mph
myResult(0:25)
# Have a look on the probability but just where there ara data points
myResult(unique(myData))
# Saving teh stuff to a directory
write.csv(cbind(unique(myData), myResult(unique(myData))), file="D:/myResult.txt")

文件myResult.txt看起來像

"","V1","V2"
"1",4,0.04
"2",7,0.08
"3",8,0.1
"4",9,0.12
"5",10,0.18
"6",11,0.22
"7",12,0.3
"8",13,0.38
"9",14,0.46
"10",15,0.52
"11",16,0.56
"12",17,0.62
"13",18,0.7
"14",19,0.76
"15",20,0.86
"16",22,0.88
"17",23,0.9
"18",24,0.98
"19",25,1

含義

在此處輸入圖片說明

注意：我有一個德語Excel，所以小數點符號是逗號而不是點。

Answer 1

ecdf的輸出是一個函數，以及其他對象類型。 您可以使用class(myResult)進行驗證， class(myResult)顯示對象myResult的S4類。

如果輸入myResult(unique(myData)) ，則R將以出現在myData中的所有不同值評估ecdf對象myResult ，並將其打印到控制台。 要保存輸出，可以輸入write.csv(cbind(unique(myData), myResult(unique(myData))), file="C:/Documents/My ecdf.csv")將其保存到該文件路徑。

ecdf不會告訴您有多少輛汽車在特定閾值以上/以下。 而是指出從數據集中隨機選擇的汽車高於或低於閾值的可能性 。 如果您對滿足某些條件的汽車數量感興趣，只需計算一下即可。 myData[myData<=10]返回數據元素，而length(myData[myData<=10])告訴您其中有多少個元素。

假設您的意思是您想知道從數據中隨機選擇的汽車低於10 mph的樣本概率，這就是myResult(10)給出的值。

Answer 2

如我所見，您的主要要求是在每個x值處重現跳躍。 嘗試這個：

> x <- c(cars$speed, cars$speed, 1, 28)
> y <- c((0:49)/50, (1:50)/50, 0, 1)
> ord <- order(x)
> plot(y[ord] ~ x[ord], type="l")

結果圖

前50（x，y）對是跳躍的起點，后50對是終點，后兩對是起點和終點值$（x_1-3,0）$和$（x_ {50} +3,1）$。 然后，您需要按$ x $的升序對值進行排序。

提取/導出R（ecdf）中的經驗累積分布函數的數據

問題描述

范例程式碼

輸出量

問題

更新資料

解

2 個解決方案

解決方案1
3 已采納 2014-07-27 18:48:44

解決方案2
3 2014-07-27 19:48:18

提取/導出R（ecdf）中的經驗累積分布函數的數據

問題描述

范例程式碼

輸出量

問題

更新資料

解

2 個解決方案

解決方案1 3 已采納 2014-07-27 18:48:44

解決方案2 3 2014-07-27 19:48:18

解決方案1
3 已采納 2014-07-27 18:48:44

解決方案2
3 2014-07-27 19:48:18