I am a bit confused about how scatter_matrix
in the pandas.plotting
module works. eg, see the plot here https://www.geeksforgeeks.org/pair-plots-using-scatter-matrix-in-pandas/
The 3 plots along the main diagonal looks like distributions. But the y and x axis labels indicate it's plotting a variable vs. itself, so shouldn't it be a straight line? Where did the distribution come from?
By default pandas.plotting.scatter_matrix
plots histograms on the diagonal. Each histogram shows the counts of just that column of data. Otherwise, as you mentioned, we'd only have (useless) straight lines on the diagonal.
There is a diagonal
param to choose between a histogram or kernel density:
pandas.plotting.scatter_matrix(frame, alpha=0.5, figsize=None, ax=None, grid=False, diagonal='hist', marker='.', density_kwds=None, hist_kwds=None, range_padding=0.05, **kwargs)
...
diagonal{'hist', 'kde'}
: Pick between 'kde' and 'hist' for either Kernel Density Estimation or Histogram plot in the diagonal.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.