[英]how to plot categorical and continuous data in pandas/matplotlib/seaborn
I am trying to figure out how could I plot this data:我想弄清楚如何绘制这些数据:
column 1 ['genres']: These are the value counts for all the genres in the table第 1 列 ['genres']:这些是表中所有类型的值计数
Drama 2453
Comedy 2319
Action 1590
Horror 915
Adventure 586
Thriller 491
Documentary 432
Animation 403
Crime 380
Fantasy 272
Science Fiction 214
Romance 186
Family 144
Mystery 125
Music 100
TV Movie 78
War 59
History 44
Western 42
Foreign 9
Name: genres, dtype: int64
column 2 ['release_year']: These are the value counts for all the release years for different kind of genres第 2 列 ['release_year']:这些是不同类型流派的所有发行年份的价值计数
2014 699
2013 656
2015 627
2012 584
2011 540
2009 531
2008 495
2010 487
2007 438
2006 408
2005 363
2004 307
2003 281
2002 266
2001 241
2000 226
1999 224
1998 210
1996 203
1997 192
1994 184
1993 178
1995 174
1988 145
1989 136
1992 133
1991 133
1990 132
1987 125
1986 121
1985 109
1984 105
1981 82
1982 81
1983 80
1980 78
1978 65
1979 57
1977 57
1971 55
1973 55
1976 47
1974 46
1966 46
1975 44
1964 42
1970 40
1967 40
1972 40
1968 39
1965 35
1963 34
1962 32
1960 32
1969 31
1961 31
Name: release_year, dtype: int64
I need to answer the questions like - What genre is most popular from year to year?我需要回答这样的问题 - 每年最流行的类型是什么? and so on
等等
what kind of plots can be used and what is the best way to do this since there would be a lot of bins ins a single chart?可以使用什么样的图,最好的方法是什么,因为在一个图表中会有很多垃圾箱?
Is seaborn better for plotting such variables? seaborn 更适合绘制此类变量吗?
Should I divide the year data into 2 decades(1900 and 2000)?我应该将年份数据分成 2 个十年(1900 年和 2000 年)吗?
Sample of the table:
id popularity runtime genres vote_count vote_average release_year
0 135397 32.985763 124 Action 5562 6.5 2015
1 76341 28.419936 120 Action 6185 7.1 1995
2 262500 13.112507 119 Adventure 2480 6.3 2015
3 140607 11.173104 136 Thriller 5292 7.5 2013
4 168259 9.335014 137 Action 2947 7.3 2005
You could do something like this:你可以这样做:
Plotting histogram using seaborn for a dataframe 使用 seaborn 为数据框绘制直方图
Personally i prefer seaborn for this kind of plots, because it's easier.就我个人而言,我更喜欢 seaborn 用于这种情节,因为它更容易。 But you can use matplotlib too.
但是您也可以使用 matplotlib。
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
# sample data
samples = 300
ids = range(samples)
gind = np.random.randint(0, 4, samples)
years = np.random.randint(1990, 2000, samples)
# create sample dataframe
gkeys = {1: 'Drama', 2: 'Comedy', 3: 'Action', 4: 'Adventure', 0: 'Thriller'}
df = pd.DataFrame(zip(ids, gind, years),
columns=['ID', 'Genre', 'Year'])
df['Genre'] = df['Genre'].replace(gkeys)
# count the year groups
res = df.groupby(['Year', 'Genre']).count()
res = res.reset_index()
# only the max values
# res_ind = res.groupby(['Year']).idxmax()
# res = res.loc[res_ind['ID'].tolist()]
# viz
sns.set(style="white")
g = sns.catplot(x='Year',
y= 'ID',
hue='Genre',
data=res,
kind='bar',
ci=None,
)
g.set_axis_labels("Year", "Count")
plt.show()
If this are to many bins in a plot, just split it up.如果这是一个情节中的许多垃圾箱,只需将其拆分即可。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.