[英]Wide format CSV with Plotly Express
I have a wide csv file that where the first column is the date (in Ymd
format) and following columns, with headers are different covid19 related indicators, like in this example:我有一个宽的 csv 文件,其中第一列是日期(以Ymd
格式)和以下列,标题是不同的 covid19 相关指标,如本例所示:
data数据 | casos_novos_t casos_novos_t | casos_novos_d casos_novos_d | obitos_t obitos_t | obitos_d obitos_d |
---|---|---|---|---|
2021-21-04 2021-21-04 | 123000 123000 | 12000 12000 | 34000 34000 | 345 345 |
2021-22-04 2021-22-04 | 134000 134000 | 14000 14000 | 34505 34505 | 567 567 |
and so on.等等。 The sample dataset can be found here样本数据集可以在这里找到
I have the following code in Python:我在 Python 中有以下代码:
# Import Libraries
import pandas as pd
import plotly.express as px
# Read CSV file
df = pd.read_csv("covid19pt_data.csv")
# Plot
fig = px.bar(df)
fig.show()
After running it I get the following error message:运行它后,我收到以下错误消息:
ValueError: Plotly Express cannot process wide-form data with columns of different type ValueError: Plotly Express 无法处理具有不同类型列的宽格式数据
If I change my code to如果我将代码更改为
fig = px.bar(df, x='data',y='novos_casos_t')
fig.show()
the code works, showing me a bar graph for that column.该代码有效,向我展示了该列的条形图。 The same for any other column.任何其他列也是如此。
However, after reading this post I was under the impression that plotly now supports wide format frames so I don't understand what I'm clearly doing wrong.但是,在阅读了这篇文章后,我的印象是 plotly 现在支持宽格式框架,所以我不明白我明显做错了什么。
My experience level is close to zero and any help in making me understand what I'm doing wrong is much appreciated.我的经验水平接近于零,非常感谢任何帮助我了解我做错了什么的帮助。
If you scroll down to the section on Wide-Form Defaults in Plotly , they explain that if you don't provide the parameters x
or y
for the px.bar
method, then the default behavior is to set x to df.index
and y to df.columns
.如果您向下滚动到Plotly 中的 Wide-Form Defaults 部分,他们解释说,如果您没有为px.bar
方法提供参数x
或y
,那么默认行为是将 x 设置为df.index
和 y到df.columns
。 However, your DataFrame is not indexed by date, and your date column data
is the first column in df.columns但是,您的 DataFrame 未按日期索引,并且您的日期列data
是 df.columns 中的第一列
If we look at df.dtypes
for your DataFrame:如果我们查看您的df.dtypes
的 df.dtypes:
data object
casos_novos_t int64
casos_novos_d int64
obitos_d int64
obitos_t int64
d_internados int64
d_uci int64
recuperados_t int64
recuperados_d int64
vigilancia_t int64
vigilancia_d int64
ativos_t int64
ativos_d int64
We can see that the columns are not the same data type because the data
column is an object
type while the others columns are int64
, and since df.columns is passed to the y
parameter by default, this is what is causing Plotly express to throw ValueError: Plotly Express cannot process wide-form data with columns of different type
我们可以看到列的数据类型不同,因为data
列是object
类型,而其他列是int64
,并且由于 df.columns 默认传递给y
参数,这就是导致 Plotly express 抛出的原因ValueError: Plotly Express cannot process wide-form data with columns of different type
If you want to use this default behavior to plot all of the numerical columns against the data
column in a bar chart, you can set the index of your DataFrame to the data
column using df = df.set_index('data')
.如果要将此默认行为用于 plot 条形图中data
列的所有数字列,则可以使用df = df.set_index('data')
将 DataFrame 的索引设置为data
列。 This ensures that the remaining columns in your DataFrame are of the same int64
type.这可确保 DataFrame 中的其余列具有相同的int64
类型。
# Import Libraries
import pandas as pd
import plotly.express as px
# Read CSV file
df = pd.read_csv("covid19pt_data.csv")
## set the index as the data column
## all of the remaining columns are now of the same int64 type
df = df.set_index('data')
# Plot
fig = px.bar(df)
fig.show()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.