简体   繁体   中英

apply ecdf function to each column in dataframe and plot

I want to apply my custom ecdf function on each column in a dataframe, then plot the ecdf based on the returned x,y values

the custom function:

def ecdf(df):
    n = len(df)
    x = np.sort(df)
    y = np.arange(1, n+1)/n
    return x, y

my attempt at a for loop:

for col in sj_interpol_data.columns:
   x_col, y_col = ecdf(col)
   ax = plt.figure()
   ax = plt.plot(x_col, y_col, marker='.', linestyle='none')
   ax = plt.margins=(0.02)
   plt.show()

Edited to include error:

AxisError                                 Traceback (most recent call last)
<ipython-input-75-d03c4fa0a973> in <module>()
      2 #design a for-loop which applies ecdf() on each column in df and plots them separately
      3 for col in sj_interpol_data.columns:
----> 4     x_col, y_col = ecdf(col)
      5     ax = plt.figure()
      6     ax = plt.plot(x_col, y_col, marker='.', linestyle='none')

<ipython-input-32-353fb281e367> in ecdf(df)
      4     n = len(df)
      5     #define x values - sorted values in array
----> 6     x = np.sort(df)
      7     #define y values - maps location of each datapoint WR to their percentiles
      8     y = np.arange(1, n+1)/n

C:\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py in sort(a, axis, kind, order)
    845     else:
    846         a = asanyarray(a).copy(order="K")
--> 847     a.sort(axis=axis, kind=kind, order=order)
    848     return a
    849 

AxisError: axis -1 is out of bounds for array of dimension 0

Any advice on how to write this function so it can be applied to all columns in a dataframe and automatically plot in a for loop?

您将列名传递给ecdf函数,但您想将数据帧传递给它,至少这是函数定义所指示的。

I figured out the answer. I use df.sort_values() in the ecdf function, which uses pandas to sort values instead of numpy

so the modified function is:

def ecdf(df):
    n = len(df)
    x = df.sort_values()
    y = np.arange(1, n+1)/n
    return x, y

after applyting the for loop (shown above), the output resulted in separate ecdf plots for each column in the dataframe

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM