简体   繁体   English

如何使用 python 可视化来自 dataframe 的分组数据

[英]How to visualize grouped data from a dataframe using python

After grouping two columns of data in my dataframe, I obtained a small table of integers whose image I've attached below (it has been given as a link since I am very new at posting in stack overflow).在我的 dataframe 中对两列数据进行分组后,我获得了一个小的整数表,我在下面附上了它的图像(它已作为链接给出,因为我在堆栈溢出中发布非常新)。

Please click here for the image of the data请点击此处查看数据图片

This was the code used for grouping:这是用于分组的代码:

count = x_train.groupby(['bool_loc', 'target']).size() 

I am trying to visualize this data (type int64) using python and thought that maybe a histogram with two categories 0 and 1 (for column 'bool_loc') and each category having two bars (for column 'target') with their heights representing frequency would be a good way to do so.我正在尝试使用 python 可视化此数据(类型 int64),并认为可能是具有两个类别 0 和 1(用于列“bool_loc”)的直方图,并且每个类别都有两个条形图(用于列“目标”),它们的高度代表频率将是这样做的好方法。 I tried like this:我试过这样:

# create figure and axis
fig, ax = plt.subplots()
# plot histogram
ax.hist(count)
# set title and labels
ax.set_title('Relation Between Location Data Presence and Disaster Tweets')
ax.set_xlabel('Location Data Presence')
ax.set_ylabel('Frequency of Tweets')

The histogram I obtained:我得到的直方图:

Image of obtained histogram获得的直方图图像

It seems that the frequency data has been plotted along the x-axis (it should be on the y-axis) instead of the data in 'bool_loc'.似乎频率数据是沿着 x 轴(它应该在 y 轴上)而不是“bool_loc”中的数据绘制的。 I would really appreciate some guidance in this and welcome other visualization techniques.我非常感谢这方面的一些指导,并欢迎其他可视化技术。

** Please tell me if this question needs to be made clearer ** 请告诉我这个问题是否需要更清楚

I tried to visualize histograms based on the shape of your dataframe.我试图根据您的 dataframe 的形状来可视化直方图。 Here is the result: 2 histograms with 2 bins结果如下: 2 个直方图,2 个 bin

I'm not sure if this complies with your data input, as I simply made similar dataframe to the one in your post.我不确定这是否符合您的数据输入,因为我只是制作了与您帖子中的类似的 dataframe 。 Probably you have it done differently.可能你有不同的做法。

The code is below:代码如下:

import pandas as pd

import matplotlib
import numpy as np
import matplotlib.pyplot as plt

# make dataframe
arrays = [[0, 0, 1, 1],
          [0, 1, 0, 1]]
data = [1458, 1075, 2884, 2196] 
df = pd.DataFrame(data, index=arrays, columns=['frequency'])

# get data from DF series
y1 = df.loc[0,'frequency'].to_list()
y2 = df.loc[1,'frequency'].to_list()

# get data arrays
arr1 = [0] * y1[0] + [1] * y1[1]
arr2 = [0] * y2[0] + [1] * y2[1]

# set matplotlib plot
fig, ax = plt.subplots()

# plot histogram
num_bins = 2
ax.hist([arr1, arr2], num_bins, density=False, label=['bool_loc 0', 'bool_loc 1'])
plt.legend(loc='upper right')
plt.show()

It seems like you want us to write code for you.您似乎希望我们为您编写代码。 Did you try it before if yes include your code in question.如果是的话,您之前是否尝试过包含您有问题的代码。 It would be easy for us to understand and so we can modify for you corrct version of code.这对我们来说很容易理解,因此我们可以为您修改正确版本的代码。 Thanks谢谢

I don't know much about your date but still try this我对你的约会不太了解,但还是试试这个

print (data.target.value_counts(normalize=True).plot(kind='bar'))

Also check how to ask question https://stackoverflow.com/help/how-to-ask还要检查如何提出问题https://stackoverflow.com/help/how-to-ask

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM