简体   繁体   中英

Plotly python regression in ternary space

I'm trying to draw a regression line in plotly python in ternary space, but there doesn't seem to be an option like "trendline = 'loess' for scatter ternaries. Is there another way to achieve the same result for ternaries? Code from a previous post that makes a spline line but not a regression.

import numpy as np
import plotly.graph_objects as go

a = np.array([0.15, 0.15, 0.17, 0.2 , 0.21, 0.24, 0.26, 0.27, 0.27, 0.29, 0.32, 0.35, 0.39, 0.4 , 0.4 , 0.41, 0.47, 0.48, 0.51, 0.52, 0.54, 0.56, 0.59, 0.62, 0.63, 0.65, 0.69, 0.73, 0.74])
b = np.array([0.14, 0.15, 0.1 , 0.17, 0.17, 0.18, 0.05, 0.16, 0.17, 0.04, 0.03, 0.14, 0.13, 0.13, 0.14, 0.14, 0.13, 0.13, 0.14, 0.14, 0.15, 0.16, 0.18, 0.2 , 0.21, 0.22, 0.24, 0.25, 0.25])
c = np.array([0.71, 0.7 , 0.73, 0.63, 0.62, 0.58, 0.69, 0.57, 0.56, 0.67, 0.65, 0.51, 0.48, 0.47, 0.46, 0.45, 0.4 , 0.39, 0.35, 0.34, 0.31, 0.28, 0.23, 0.18, 0.16, 0.13, 0.07, 0.02, 0.01])

fig = go.Figure()

curve_portion = np.where((b < 0.15) & (c > 0.6))
curve_other_portion = np.where(~((b < 0.15) & (c > 0.6)))

def add_plot_spline_portions(fig, indices_groupings):
    for indices in indices_groupings:
        fig.add_trace(go.Scatterternary({
            'mode': 'lines',
            'connectgaps': True,
            'a': a[indices],
            'b': b[indices],
            'c': c[indices],
            'line': {'color': 'black', 'shape': 'spline', 'smoothing': 1},
            'marker': {'size': 2, 'line': {'width': 0.1}}
            })
            )    

add_plot_spline_portions(fig, [curve_portion, curve_other_portion])
fig.show(renderer='png')

锯齿样条而不是回归

I can outline what I think is a general sort of solution - it doesn't have as much mathematical vigor as I would like, and I arrived at my conclusion(s) from a lot of guess and check type work - but hopefully it's still helpful.

The first consideration is that for this regression on a ternary plot, there are only two degrees of freedom because A+B+C=1 (you might find this explanation helpful). This means it only makes sense to consider the relationship between two of the variables at a time. What we really want to do is create a regression between two of the variables, then determine the value of the third variable using the equation A+B+C=1.

The second consideration is bit harder to define, but since you are after a regression that captures the "reversing" nature of the variable A , we want a regression where A can take on repeated values. I think the most straightforward way to achieve this is for A to be the variable you are predicting.

For simplicity sake, let's say we use a degree 2 polynomial regression that predicts A from either B or C. We can make a scatter and choose whichever polynomial will have a better fit for our purposes.

Here is a quick eda:

import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots

a = np.array([0.15, 0.15, 0.17, 0.2 , 0.21, 0.24, 0.26, 0.27, 0.27, 0.29, 0.32, 0.35, 0.39, 0.4 , 0.4 , 0.41, 0.47, 0.48, 0.51, 0.52, 0.54, 0.56, 0.59, 0.62, 0.63, 0.65, 0.69, 0.73, 0.74])
b = np.array([0.14, 0.15, 0.1 , 0.17, 0.17, 0.18, 0.05, 0.16, 0.17, 0.04, 0.03, 0.14, 0.13, 0.13, 0.14, 0.14, 0.13, 0.13, 0.14, 0.14, 0.15, 0.16, 0.18, 0.2 , 0.21, 0.22, 0.24, 0.25, 0.25])
c = np.array([0.71, 0.7 , 0.73, 0.63, 0.62, 0.58, 0.69, 0.57, 0.56, 0.67, 0.65, 0.51, 0.48, 0.47, 0.46, 0.45, 0.4 , 0.39, 0.35, 0.34, 0.31, 0.28, 0.23, 0.18, 0.16, 0.13, 0.07, 0.02, 0.01])

## eda to determine polynomial of best fit to predict A 
fig_eda = make_subplots(rows=1, cols=2)

fig_eda.add_trace(go.Scatter(x=b, y=a, mode='markers'),row=1, col=1)
coefficients = np.polyfit(b,a,2)
p = np.poly1d(coefficients)
b_vals = np.linspace(min(b),max(b))
a_pred = np.array([p(x) for x in b_vals])
fig_eda.add_trace(go.Scatter(x=b_vals, y=a_pred, mode='lines'),row=1, col=1)

fig_eda.add_trace(go.Scatter(x=c, y=a, mode='markers'),row=1, col=2)
coefficients = np.polyfit(c,a,2)
p = np.poly1d(coefficients)
c_vals = np.linspace(min(c),max(c))
a_pred = np.array([p(x) for x in c_vals])
fig_eda.add_trace(go.Scatter(x=c_vals, y=a_pred, mode='lines'),row=1, col=2)

在此处输入图像描述

Notice how predicting A from B looks like it captures the reversing nature of A better than predicting A from C. If we try to make a degree 2 polynomial regression on A from C, we can see A is not going to repeat for within the domain of C: [0,1] because of the very low sloping nature of that polynomial.

So let's proceed with this regression with C as the predictor variable, and A as the predicted variable (and B also being a predicted variable using B = 1 - (A + C) .

fig = go.Figure()

fig.add_trace(go.Scatterternary({
    'mode': 'markers',
    'connectgaps': True,
    'a': a,
    'b': b,
    'c': c
}))   

## since A+B+C = 100, we only need to fit a polynomial between two of the variables
## fit an n-degree polynomial to 2 of your variables
## source https://numpy.org/doc/stable/reference/generated/numpy.polyfit.html

coefficients = np.polyfit(b,a,2)
p = np.poly1d(coefficients)

## we use the entire domain of the input variable B
b_vals = np.linspace(0,1)

a_pred = np.array([p(x) for x in b_vals])
c_pred = 1 - (b_vals + a_pred)

fig.add_trace(go.Scatterternary({
    'mode': 'lines',
    'connectgaps': True,
    'a': a_pred,
    'b': b_vals,
    'c': c_pred,
    'marker': {'size': 2, 'color':'red', 'line': {'width': 0.1}}
}))   

fig.show()

在此处输入图像描述

This is the lowest degree polynomial regression that allows for repeated values of A (a linear regression to predict A would be the wouldn't allow A to take on repeated values). However, you can definitely experiment with increasing the degree of the polynomial you are using, and predicting A from either variables B or C.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM