简体   繁体   English

来自熊猫交叉表的 Plotly 气泡图

[英]Plotly Bubble chart from pandas crosstab

How can I plot a bubble chart from a dataframe that has been created from a pandas crosstab of another dataframe?如何根据从另一个数据框的熊猫交叉表创建的数据框绘制气泡图?

Imports;进口;

import plotly as py
import plotly.graph_objects as go
from plotly.subplots import make_subplots

The crosstab was created using;交叉表是使用创建的;

df = pd.crosstab(raw_data['Speed'], raw_data['Height'].fillna('n/a'))

The df contains mostly zeros, however where a number appears I want a point where the value controls the point size. df 主要包含零,但是在出现数字的地方我想要一个值控制点大小的点。 I want to set the Index values as the x axis and the columns name values as the Y axis.我想将索引值设置为 x 轴,将列名值设置为 Y 轴。

The df would look something like; df 看起来像;

         10    20    30    40    50
1000     0     0    0      0     5
1100     0     0    0      7     0
1200     1     0    3      0     0
1300     0     0    0      0     0
1400     5     0    0      0     0

I've tried using scatter & Scatter like this;我试过像这样使用 scatter & Scatter ;

fig.add_trace(go.Scatter(x=df.index.values, y=df.columns.values, size=df.values,
                         mode='lines'),
              row=1, col=3)

This returned a TypeError: 'Module' object not callable.这返回了一个 TypeError: 'Module' object not callable。

Any help is really appreciatted.任何帮助真的很感激。 Thanks谢谢

UPDATE更新

The answers below are close to what I ended up with, main difference being that I reference 'Speed' in the melt line;下面的答案与我最终得到的答案很接近,主要区别在于我在熔化线中引用了“速度”;

df.reset_index()
df.melt(id_vars="Speed")
df.rename(columns={"index":"Engine Speed",
                    "variable":"Height",
                    "value":"Count"})
df[df!=0].dropna()

scale=1000

fig.add_trace(go.Scatter(x=df["Speed"], y=df["Height"],mode='markers',marker_size=df["Count"]/scale),
              row=1, col=3)

This works however my main problem now is that the dataset is huge and plotly is really struggling to deal with it.这行得通,但是我现在的主要问题是数据集很大,而且 plotly 真的很难处理它。

Update 2更新 2

Using Scattergl allows Plotly to deal with the large dataset very well!使用 Scattergl 可以让 Plotly 很好地处理大型数据集!

I recommend to use tidy format to represent your data.我建议使用tidy 格式来表示您的数据。 We say a dataframe is tidy if and only if我们说一个数据框是整洁的当且仅当

  1. Each row is an observation每一行都是一个观察
  2. Each column is a variable每一列都是一个变量
  3. Each value must have its own cell每个值必须有自己的单元格

To create a more tidy-dataframe you can do要创建更整洁的数据框,您可以执行以下操作

df = pd.crosstab(raw_data["Speed"], raw_data["Height"])
df.reset_index(level=0, inplace=True)
df.melt(id_vars=["Speed", "Height"], value_vars=["Counts"])
   Speed  Height  Counts
0   1000      10       2
1   1100      20       1
2   1200      10       1
3   1200      30       1
4   1300      40       1
5   1400      50       1

The next step is to do the actual plotting.下一步是进行实际绘图。

# when scale is increased bubbles will become larger
scale = 10 
# create the scatter plot
scatter = go.Scatter(
    x=df.Speed, 
    y=df.Height,
    marker_size=df.counts*scale,
    mode='markers')
fig = go.Figure(scatter)
fig.show()

This will create a plot as shown below.这将创建一个图,如下所示。 气泡图

If this is the case you can use plotly.express this is very similar to @Erik answer but shouldn't return errors.如果是这种情况,您可以使用plotly.express这与@Erik 的答案非常相似,但不应返回错误。

import pandas as pd
import plotly.express as px
from io import StringIO

txt = """
        10    20    30    40    50
1000     0     0    0      0     5
1100     0     0    0      7     0
1200     1     0    3      0     0
1300     0     0    0      0     0
1400     5     0    0      0     0
"""

df = pd.read_csv(StringIO(txt), delim_whitespace=True)

df = df.reset_index()\
       .melt(id_vars="index")\
       .rename(columns={"index":"Speed",
                        "variable":"Height",
                        "value":"Count"})

fig = px.scatter(df, x="Speed", y="Height",size="Count")
fig.show()

在此处输入图片说明

UPDATE In case you got error please check your pandas version with pd.__version__ and try to check line by line this更新如果您遇到错误,请使用pd.__version__检查您的pandas version并尝试逐行检查

df = pd.read_csv(StringIO(txt), delim_whitespace=True)

df = df.reset_index()

df = df.melt(id_vars="index")

df = df.rename(columns={"index":"Speed",
                        "variable":"Height",
                        "value":"Count"})

and report in which line it breaks.并报告它在哪一行中断。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM