简体   繁体   中英

Python: How to plot a conditional cumulative frequency histogram?

I have this list of data for which I would like to plot a histogram chart. However the graph is not very readable for large values ​​of the X axis and are not really important to keep them.

Here is a sub sample of my data:

print(v)

1      1738   #the values ​​I want to plot on the histogram
2      2200
3      1338
4      1222
5       939
6       898 

I calculated the cumulative frequency as follows:

v = x.cumsum()
t = [round(100*v/x.sum(),2)]
t

and the output is:

 1        9.90
 2       22.44
 3       30.06
 4       37.02
 5       42.37

How can I represent on the histogram only the data for which the cumulative frequency is less than or equal to 40%?

I don't know how to do in python, thank you in advance for your help

The short answer is: Slice the numpy array to filter values <= 40%. For example, if a is a 1D numpy array:

a[a <= 40]

A longer answer is provided by the example below, which shows:

  • A generation of normally distributed random data (as the provided dataset is very small)
  • Performing your calculation on the numpy array
  • Slicing the array to return values which are <= 40%
  • Plotting the results using the Plotly library - API only.

Example code:

import numpy as np
import plotly.io as pio

# Generate random dataset (for demo only).
np.random.seed(1)
X = np.random.normal(0, 1, 10000)

# Calculate the cumulative frequency.
X_ = np.cumsum(X)*100/X.sum()
data = X_[X_ <= 40]

# Plot the histogram.
pio.show({'data': {'x': data, 
                   'type': 'histogram', 
                   'marker': {'line': {'width': 0.5}}},
          'layout': {'title': 'Cumulative Frequency Demo'}})

Output:

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM