简体   繁体   中英

Extracting/Exporting the Data of the Empirical Cumulative Distribution Function in R (ecdf)

I use R to calculate the ecdf of some data. I want to use the results in another software. I use R just to do the 'work' but not to produce the final diagram for my thesis.

Example Code

# Plotting the a built in sampla data
plot(cars$speed)
# Assingning the data to a new variable name
myData = cars$speed
# Calculating the edcf
myResult = ecdf(myData)
myResult
# Plotting the ecdf
plot(myResult)

Output

> # Plotting the a built in sampla data
> plot(cars$speed)
> # Assingning the data to a new variable name
> myData = cars$speed
> # Calculating the edcf
> myResult = ecdf(myData)
> myResult
Empirical CDF 
Call: ecdf(myData)
 x[1:19] =      4,      7,      8,  ...,     24,     25
> # Plotting the ecdf
> plot(myResult)
> plot(cars$speed)

在此处输入图片说明

在此处输入图片说明

Questions

Question 1

How do I get the raw information in order to plot the ecdf diagram in another software (eg Excel, Matlab, LaTeX)? For the histogram function I can just write

res = hist(...)

and I find all the information like

res$breaks
res$counts
res$density
res$mids
res$xname

Question 2

How do I calculate the inverse ecdf? Say I want to know how many cars have a speed below 10 mph (the example data is car speed).

Update

Thanks to the answer of user777 I have more information now. If I use

> myResult(0:25)
 [1] 0.00 0.00 0.00 0.00 0.04 0.04 0.04 0.08 0.10 0.12 0.18 0.22 0.30 0.38
[15] 0.46 0.52 0.56 0.62 0.70 0.76 0.86 0.86 0.88 0.90 0.98 1.00

I get the data for 0 to 25 mph. But I do not know where to draw a data point. I do want to reproduce the R plot exactly.

Here I have a data point every 1 mph.

在此处输入图片说明

Here I do not have a data pint every 1 mph. I only have a data point if there is data available.

在此处输入图片说明

Solution

# Plotting the a built in sample data
plot(cars$speed)
# Assingning the data to a new variable name
myData = cars$speed
# Calculating the edcf
myResult = ecdf(myData)
myResult
# Plotting the ecdf
plot(myResult)
# Have a look on the probability for 0 to 25 mph
myResult(0:25)
# Have a look on the probability but just where there ara data points
myResult(unique(myData))
# Saving teh stuff to a directory
write.csv(cbind(unique(myData), myResult(unique(myData))), file="D:/myResult.txt")

The file myResult.txt looks like

"","V1","V2"
"1",4,0.04
"2",7,0.08
"3",8,0.1
"4",9,0.12
"5",10,0.18
"6",11,0.22
"7",12,0.3
"8",13,0.38
"9",14,0.46
"10",15,0.52
"11",16,0.56
"12",17,0.62
"13",18,0.7
"14",19,0.76
"15",20,0.86
"16",22,0.88
"17",23,0.9
"18",24,0.98
"19",25,1

Meaning

在此处输入图片说明

Attention: I have a German Excel so the decimal symbol is comma instead of the dot.

The output of ecdf is a function , among other object types. You can verify this with class(myResult) , which displayes the S4 classes of the object myResult .

If you enter myResult(unique(myData)) , R evaluates the ecdf object myResult at all distinct values appearing in myData , and prints it to the console. To save the output you can enter write.csv(cbind(unique(myData), myResult(unique(myData))), file="C:/Documents/My ecdf.csv") to save it to that filepath.

The ecdf doesn't tell you how many cars are above/below a specific threshold; rather, it states the probability that a randomly selected car from your data set is above/below the threshold. If you're interested in the number of cars satisfying some criteria, just count them. myData[myData<=10] returns the data elements, and length(myData[myData<=10]) tells you how many of them there are.

Assuming you mean that you want to know the sample probabilities that a randomly-selected car from your data is below 10 mph, that's the value given by myResult(10) .

As I see it, your main requirement is to reproduce the jumps at each x value. Try this:

> x <- c(cars$speed, cars$speed, 1, 28)
> y <- c((0:49)/50, (1:50)/50, 0, 1)
> ord <- order(x)
> plot(y[ord] ~ x[ord], type="l")

结果图

The first 50 (x,y) pairs are tyhe beginnings of the jumps, the next 50 are the ends, and the last two give you starting and ending values at $(x_1-3,0)$ and $(x_{50}+3,1)$. Then you need to sort the values in increasing order in $x$.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM