简体   繁体   English

Python 统计和可视化

[英]Python stats and visualization

I am new to Python and am currently working on a set of real estate data from redfinn.我是 Python 的新手,目前正在处理来自 redfinn 的一组房地产数据。

Currently my data looks like this:目前我的数据是这样的: 数据集

There are many different neighborhoods in the dataset.数据集中有许多不同的社区。 I would like to:我想:

  1. get the average homes_sold per month(date field was cut out of the screenshot) per neighborhood获取每个社区每月的平均 homes_sold(日期字段已从屏幕截图中删除)
  2. graph the above using only the neighborhoods I wish to use (about 4).仅使用我希望使用的社区(大约 4 个)绘制以上图表。

Any help is greatly appreciated.任何帮助是极大的赞赏。

As I understood, you have different values of sold per month houses and you want to take an average of it.据我了解,您有不同的每月售出房屋价值,并且您想取其平均值。 If so, try this code (provide your data instead):如果是这样,请尝试此代码(改为提供您的数据):

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline

data = pd.DataFrame({'neighborhood':['n1','n1','n2','n3','n3','n4','n5'],'homes_sold per month':[5,7,2,6,4,1,5],'something_else':[5,3,3,5,5,5,5]})
neighborhoods_to_plot = ['n1','n2','n4','n5'] #provide here a list you want to plot
plot = pd.DataFrame()
for n in neighborhoods_to_plot:
    plot.at[n,'homes_sold per month'] = data.loc[data['neighborhood']==n]['homes_sold per month'].mean()
plot.index.name = 'neighborhood'
plt.figure(figsize=(4,3),dpi=300,tight_layout=True)
sns.barplot(x=plot.index,y=plot['homes_sold per month'],data=plot)
plt.savefig('graph.png', bbox_inches='tight')

Plot阴谋

Okay so I am going to assume that you are using Pandas and Matplotlib in order to handle this data.好的,我假设您正在使用 Pandas 和 Matplotlib 来处理这些数据。 Then in order to get the average number of homes sold for month you just need to do:然后,为了获得当月平均售出房屋数量,您只需要执行以下操作:

import pandas as pd
mean_number_of_homes_sold = data[['neighborhood','homes_sold']].groupby['neighborhood'].agg('mean')

In order to get the information plotted with only the neighborhoods you want you will need something like this为了只用你想要的社区绘制信息,你需要这样的东西

import pandas as pd
import matplotlib.pyplot as plt
#fill this list with strings representing the names of the data you need plotted
neighborhoods_to_plot = ['Albany Park', 'Tinley Park']
data_to_graph = data[data.neighborhood.isin(neighborhoods_to_plot)]
fig, ax = plt.subplots()
data_to_graph.plot(kind='scatter', x='avg_sale_to_list', y ='inventory_mom')
ax.set(title='Relationship between time to sale from listing and inventory momentum for selected neighborhoods')
fig.savefig('neighborhood.png', transparent=False, dpi=300, bbox_inches="tight")

You can obviously change which data is graphed or the type of graph but this should give you a decent starting point.您显然可以更改图表中的数据或图表类型,但这应该为您提供一个不错的起点。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM