I want to apply my custom ecdf function on each column in a dataframe, then plot the ecdf based on the returned x,y values
the custom function:
def ecdf(df):
n = len(df)
x = np.sort(df)
y = np.arange(1, n+1)/n
return x, y
my attempt at a for loop:
for col in sj_interpol_data.columns:
x_col, y_col = ecdf(col)
ax = plt.figure()
ax = plt.plot(x_col, y_col, marker='.', linestyle='none')
ax = plt.margins=(0.02)
plt.show()
Edited to include error:
AxisError Traceback (most recent call last)
<ipython-input-75-d03c4fa0a973> in <module>()
2 #design a for-loop which applies ecdf() on each column in df and plots them separately
3 for col in sj_interpol_data.columns:
----> 4 x_col, y_col = ecdf(col)
5 ax = plt.figure()
6 ax = plt.plot(x_col, y_col, marker='.', linestyle='none')
<ipython-input-32-353fb281e367> in ecdf(df)
4 n = len(df)
5 #define x values - sorted values in array
----> 6 x = np.sort(df)
7 #define y values - maps location of each datapoint WR to their percentiles
8 y = np.arange(1, n+1)/n
C:\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py in sort(a, axis, kind, order)
845 else:
846 a = asanyarray(a).copy(order="K")
--> 847 a.sort(axis=axis, kind=kind, order=order)
848 return a
849
AxisError: axis -1 is out of bounds for array of dimension 0
Any advice on how to write this function so it can be applied to all columns in a dataframe and automatically plot in a for loop?
您将列名传递给ecdf函数,但您想将数据帧传递给它,至少这是函数定义所指示的。
I figured out the answer. I use df.sort_values() in the ecdf function, which uses pandas to sort values instead of numpy
so the modified function is:
def ecdf(df):
n = len(df)
x = df.sort_values()
y = np.arange(1, n+1)/n
return x, y
after applyting the for loop (shown above), the output resulted in separate ecdf plots for each column in the dataframe
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.