简体   繁体   中英

Create a piecewise smooth function which preserves local integrals from data

I'm a PhD student in sociology working on my dissertation. In the course of some data analysis, I have bumped up against the following problem.

I have a table of measured values of some variable over a series of years. The values count, "how many events of a certain type there are in a given year"? Here is a sample of what it looks like:

year    var
1983    22
1984    55
1985    34
1986    29
1987    15
1988    20
1989    41

So, eg in 1984, 55 such events occured over the whole year.

One way to represent this data over the domain of real numbers in [1983, 1990) is with a piecewise function f :

f(x) = var if floor(x) == year, for all x in [1983, 1990).

This function plots a series of horizontal lines of width 1, mapping a bar chart of the variable. The area under each of these lines is equal to the variable's value in that year. However, for this variable, I know that in each year, the rate is not constant over the whole year. In other words, the events don't suddenly jump from one yearly rate to another rate overnight on Dec 31, as the (discontinuous) function f seems to present. I don't know exactly how the rate changes, but I'd like to assume a smooth transition from year to year.

So, what I want is a function g which is both continuous and smooth (continuously differentiable) over the domain [1983, 1990), which also preserves the yearly totals. That is, the definite integral of g from 1984 to 1985 must still equal 55, and same for all other years. (So, for example, an n-degree polynomial which hits all the midpoints of the bars will NOT work.) Also, I'd like g to be a piecewise function, with all the pieces relatively simple -- quadratics would be best, or a sinusoid.

In sum: I want g to be a series of parabolas defined over each year, which smoothly transition from one to the other (left and right limits of g'(x) should be equal at the year boundaries), and where the area under each parabola is equal to the totals given by my data above.

I've drawn a crude version of what I want here. The cartoon uses the same data as above, with the black curve representing my hoped-for function, g . Toward the right end things got particularly bad, esp 1988 and 1989. But it's just meant to show a picture of what I would like to end up with.

Thanks for your help, or for pointing me towards other resources you think might be helpful!

PS I have looked at this paper which is linked inside this question. I agree with the authors (see section 4) that if I could replace my data with pseudodata d' using matrix A, from which I could very simply generate some sort of smooth function, that would be great, but they do not say how A could be obtained. Just some food for thought. Thanks again!

PPS What I need is a reliable method of generating g, given ANY data table as above. I actually have hundreds of these kinds of yearly count data, so I need a general solution.

You need the integral of your curve to go through a specific set of points, defined by the cumulative totals, so...

Interpolate between the cumulative totals to get an integral curve, and then take the derivative of that to get the function you're looking for.

Since you want your function to be "continuous and smooth", ie, C1-continuous, the integral curve you interpolate needs to be C2-continuous, ie, it has to have continuous first and second derivatives. You can use polynomial interpolation, sinc interpolation, splines of sufficient degree, etc.

Using "natural" cubic splines to interpolate the integral will give you a piece-wise quadratic derivative that seems to satisfy all your requirements.

There's a pretty good description of the natural cubic splines here: http://mathworld.wolfram.com/CubicSpline.html

If your goal is to transform discrete data into a continuous representation, I would recommend looking up Kernel Density Estimation . KDE essentially models each data point as a (usually) Gaussian distribution and sums up the distribution, resulting in a smooth continuous distribution. This blog does a very thorough treatment of KDE using the SciPy module.

One of the downsides of KDE is that it does not provide an analytic solution. If that is your goal, I would recommend looking up polynomial regression .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM