[英]Grouped Boxplots by Categorical Variable
Using a pandas for a large dataset which I have already reduced down to the info I need. 将熊猫用于大型数据集,而我已经将其简化为所需的信息。 Basically I would like to plot the distribution of number of friends for users from two different countries as side-by-side boxplots (what I'm referring to as grouped boxplots), by number of hashtags used in their post (range from 1-6, I'm treating this as a categorical variable). 基本上,我想绘制两个不同国家/地区的用户的并列箱形图(我指的是成组箱形图)的朋友数量分布,并根据他们帖子中使用的主题标签数(范围从1- 6,我将其视为分类变量)。 This results in a total of 2*6=12 boxplots all in the same frame for easy comparison. 这样一来,总共2 * 6 = 12个箱形图全部位于同一帧中,以便于比较。
I've done some research and I'm aware of df.boxplot(by='x'), but this doesn't account for the extra level of comparing the two countries. 我已经做过一些研究,并且知道df.boxplot(by ='x'),但这并不能说明比较这两个国家的额外水平。
The dataset has columns for number of hashtags (int), country (string), number of friends (int). 数据集包含用于标签数(int),国家(字符串),朋友数(int)的列。
It's good to note that I'm fairly new to graphing in Python, including things like axes and subplots, so please include some extra info in your answer if possible. 值得一提的是,我对使用Python进行绘图还是相当陌生,包括轴和子图等内容,因此请尽可能在答案中包含一些额外信息。
Edit: small sample of dataset 编辑:数据集的小样本
#followers #friends #mentions #hashtags country lang_user place
450 53 71 1 0 ja es NaN
489 54 34 1 1 ja es NaN
867 1569 1999 0 0 en es NaN
1021 224 242 0 3 ja ja NaN
1022 377 506 1 5 ja ja NaN
1023 315 305 0 2 ja ja NaN
I like using seaborn for this kind of visualizations. 我喜欢使用seaborn进行这种可视化。 I guessthe "extra level" you mean is called "hue". 我猜你的意思是“额外水平”被称为“色相”。
import seaborn as sns
sns.set_style("whitegrid")
tips = sns.load_dataset("tips")
ax = sns.boxplot(x="day", y="total_bill", hue="smoker",
data=tips, palette="Set3")
and the result would be: 结果将是:
check out this documentation: https://seaborn.pydata.org/generated/seaborn.boxplot.html 查看此文档: https : //seaborn.pydata.org/genic/seaborn.boxplot.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.