简体   繁体   中英

How to build a histogram from a pandas dataframe where each observation is a list?

I have a dataframe as follows. The values are in a cell, a list of elements. I want to visualize distribution of the values from the "Values" column using histogram"S" stacked in rows OR separated by colours (Area_code).

How can I get the values and construct histogram"S" in plotly? Any other idea also welcome. Thank you.

    Area_code   Values
0   New_York    [999, 54, 231, 43, 177, 313, 212, 279, 199, 267]
1   Dallas  [915, 183, 2326, 316, 206, 31, 317, 26, 31, 56, 316]
2   XXX     [560]
3   YYY     [884, 13]
4   ZZZ     [203, 1066, 453, 266, 160, 109, 45, 627, 83, 685, 120, 410, 151, 33, 618, 164, 496]

If you reshape your data, this would be a perfect case for px.histogram . And from there you can opt between several outputs like sum, average, count through the histfunc method:

fig = px.histogram(df, x = 'Area_code', y = 'Values', histfunc='sum')
fig.show()

You haven't specified what kind of output you're aiming for, but I'll leave it up to you to change the argument for histfunc and see which option suits your needs best.

在此处输入图像描述

I'm often inclined to urge users to rethink their entire data process, but I'm just going to assume that there are good reasons why you're stuck with what seems like a pretty weird setup in your dataframe. The snippet below contains a complete data munginge process to reshape your data from your setup, to a so-called long format:

   Area_code  Values
0   New_York     999
1   New_York      54
2   New_York     231
3   New_York      43
4   New_York     177
5   New_York     313
6   New_York     212
7   New_York     279
8   New_York     199
9   New_York     267
10    Dallas     915
11    Dallas     183
12    Dallas    2326
13    Dallas     316
14    Dallas     206
15    Dallas      31
16    Dallas     317
17    Dallas      26
18    Dallas      31
19    Dallas      56
20    Dallas     316
21       XXX     560
22       YYY     884
23       YYY      13
24       ZZZ     203

And this is a perfect format for many of the great functionalites of plotly.express .

Complete code:

import plotly.graph_objects as go
import plotly.express as px
import pandas as pd

# data input
df = pd.DataFrame({'Area_code': {0: 'New_York', 1: 'Dallas', 2: 'XXX', 3: 'YYY', 4: 'ZZZ'},
                 'Values': {0: [999, 54, 231, 43, 177, 313, 212, 279, 199, 267],
                  1: [915, 183, 2326, 316, 206, 31, 317, 26, 31, 56, 316],
                  2: [560],
                  3: [884, 13],
                  4: [203, 1066, 453, 266, 160, 109, 45, 627, 83, 685, 120, 410, 151, 33, 618, 164, 496]}})

# data munging
areas = []
value = []
for i, row in df.iterrows():
#     print(row['Values'])
        for j, val in enumerate(row['Values']):
            areas.append(row['Area_code'])
            value.append(val)
df = pd.DataFrame({'Area_code': areas,
                   'Values': value})

# plotly
fig = px.histogram(df, x = 'Area_code', y = 'Values', histfunc='sum')
fig.show()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM