简体   繁体   English

情节注释彼此太接近(不可读)

[英]Plotly annotations too close to each other (not readable)

I have the following code that creates a plot for the loadings after PCA:我有以下代码可以为 PCA 后的载荷创建一个图:

# Creating pipeline objects 
## PCA
pca = PCA(n_components=2)
## Create columntransformer to only scale a selected set of featues
categorical_ix = X.select_dtypes(exclude=np.number).columns

features = X.columns

ct = ColumnTransformer([
        ('encoder', OneHotEncoder(), categorical_ix),
        ('scaler', StandardScaler(), ['tenure', 'MonthlyCharges', 'TotalCharges'])
    ], remainder='passthrough')

# Create pipeline
pca_pipe = make_pipeline(ct,
                         pca)

# Fit data to pipeline
pca_result = pca_pipe.fit_transform(X)

loadings = pca.components_.T * np.sqrt(pca.explained_variance_)

fig = px.scatter(pca_result, x=0, y=1, color=customer_data_raw['Churn'])

for i, feature in enumerate(features):
    fig.add_shape(
        type='line',
        x0=0, y0=0,
        x1=loadings[i, 0],
        y1=loadings[i, 1]
    )
    fig.add_annotation(
        x=loadings[i, 0],
        y=loadings[i, 1],
        ax=0, ay=0,
        xanchor="center",
        yanchor="bottom",
        text=feature,
    )
fig.show()

Which produces the following output:产生以下输出:

在此处输入图片说明

How can I make the labels for the loadings readable?如何使装载的标签可读?

Edit: There are 19 features in X.编辑:X 中有 19 个功能。

    gender  SeniorCitizen   Partner Dependents  tenure  PhoneService    MultipleLines   InternetService OnlineSecurity  OnlineBackup    DeviceProtection    TechSupport StreamingTV StreamingMovies Contract    PaperlessBilling    PaymentMethod   MonthlyCharges  TotalCharges
customerID                                                                          
7590-VHVEG  Female  0   Yes No  1   No  No phone service    DSL No  Yes No  No  No  No  Month-to-month  Yes Electronic check    29.85   29.85
5575-GNVDE  Male    0   No  No  34  Yes No  DSL Yes No  Yes No  No  No  One year    No  Mailed check    56.95   1889.50
3668-QPYBK  Male    0   No  No  2   Yes No  DSL Yes Yes No  No  No  No  Month-to-month  Yes Mailed check    53.85   108.15
7795-CFOCW  Male    0   No  No  45  No  No phone service    DSL Yes No  Yes Yes No  No  One year    No  Bank transfer (automatic)   42.30   1840.75
9237-HQITU  Female  0   No  No  2   Yes No  Fiber optic No  No  No  No  No  No  Month-to-month  Yes Electronic check    70.70   151.65

Based on your DataFrame, you have 19 features and you are adding them all at the location as your lines because ax and ay are both set to 0.根据您的 DataFrame,您有 19 个特征,并且您将它们全部添加到该位置作为您的线,因为 ax 和 y 都设置为 0。

We can change ax and ay as you loop through your features to rotate, which will hopefully make your annotations more distinguishable.我们可以在您循环遍历特征以进行旋转时更改axay ,这有望使您的注释更易于区分。 This is based on converting from polar to cartesian coordaintes using x = r*cos(theta) and y = r*sin(theta) where theta goes through the values 0*360/19, 1*360/19, ... , 18*360/19 .这是基于使用x = r*cos(theta)y = r*sin(theta)从极坐标转换为笛卡尔坐标,其中 theta 通过值0*360/19, 1*360/19, ... , 18*360/19 We will want to set the x and y-reference to be the x- and y-coordinates instead of paper coordinates and then set r=2 or some value comparable to your plot (this will make the annotation lines length 2 at longest)我们希望将 x 和 y 参考设置为 x 和 y 坐标而不是纸坐标,然后设置 r=2 或与您的绘图相当的某个值(这将使注释线长度最长为 2)

from math import sin, cos, pi
r = 2 # this can be modified as needed, and is in units of the axis
theta = 2*pi/len(features)

for i, feature in enumerate(features):
    fig.add_shape(
        type='line',
        x0=0, y0=0,
        x1=loadings[i, 0],
        y1=loadings[i, 1]
    )
    fig.add_annotation(
        x=loadings[i, 0],
        y=loadings[i, 1],
        ax=r*sin(i*theta), 
        ay=r*cos(i*theta),
        axref="x",
        ayref="y",
        xanchor="center",
        yanchor="bottom",
        text=feature,
    )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM