简体   繁体   中英

How to bin a 2D data along the x-axis with Python

I have two arrays of corresponding data (x and y) that I plot as above on a log-log plot. The data is currently too granular and I would like to bin them to get a smoother relationship. Could I get some guidance on how I can bin along the x-axis, in exponential bin sizes, so that it appears linear on the log-log scale?

For example, if the first bin is of range x = 10^0 to 10^1, I want to collect all y-values with corresponding x in that range and average them into one value for that bin. I don't think np.hist or plt.hist quite does the trick, since they do binning by counting occurrences.

Edit: For context, if it helps, the above plot is an assortativity plot that plots the in vs out degree of a certain network.

You can achieve this with pandas. The idea is to assign each X value to an interval using np.digitize . Since you are using a log scale, it makes sense to use np.logspace to choose intervals of exponentially changing lengths. Finally, you can group X values in each interval and compute mean Y values.


import pandas as pd
import numpy as np

x_max = 10

xs = np.exp(x_max * np.random.rand(1000))
ys = np.exp(np.random.rand(1000))

df = pd.DataFrame({
    'X': xs,
    'Y': ys,
})

df['Xbins'] = np.digitize(df.X, np.logspace(0, x_max, 30, base=np.exp(1)))
df['Ymean'] = df.groupby('Xbins').Y.transform('mean')
df.plot(kind='scatter', x='X', y='Ymean')

You may use scipy.stats.binned_statistic to get the mean of the data in each bin. The bins would best be created via numpy.logspace . You may then plot those means eg as horiziontal lines spanning the bin width or as scatter at the mean position.

import numpy as np; np.random.seed(42)
from scipy.stats import binned_statistic
import matplotlib.pyplot as plt

x = np.logspace(0,5,300)
y = np.logspace(0,5,300)+np.random.rand(300)*1.e3


fig, ax = plt.subplots()
ax.scatter(x,y, s=9)

s, edges, _ = binned_statistic(x,y, statistic='mean', bins=np.logspace(0,5,6))

ys = np.repeat(s,2)
xs = np.repeat(edges,2)[1:-1]
ax.hlines(s,edges[:-1],edges[1:], color="crimson", )

for e in edges:
    ax.axvline(e, color="grey", linestyle="--")

ax.scatter(edges[:-1]+np.diff(edges)/2, s, c="limegreen", zorder=3)

ax.set_xscale("log")
ax.set_yscale("log")
plt.show()

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM