简体   繁体   English

获取熊猫列的平均值

[英]Getting average of column in pandas

I'm trying to be able to read a file where I will pull what the name of the location is and then calculate the average amount of snow they get. 我正在尝试读取一个文件,在该文件中我将提取该位置的名称,然后计算它们得到的平均降雪量。 This is what I have so far. 到目前为止,这就是我所拥有的。

import pandas
data = pandas.read_csv('filteredData.csv')
if ('NAME' == 'ADA 0.7 SE, MI US'):
    data.ix['1/1/2016':'12/31/2016']
    newdata=data['SNOW'].mean()

I am still unsure if doing this would be better by grouping the name of the locations and then calculating the average snow that way. 我仍然不确定这样做是否会更好,方法是将地点名称分组,然后以这种方式计算平均降雪量。

Please be patient with me I'm still brand new to pandas. 请耐心等待,我还是熊猫的新手。

This image is just one part of twenty different locations: 该图像只是二十个不同位置的一部分: 该图像只是二十个不同位置的一部分

Looks like you can use groupby() : 看起来您可以使用groupby()

import pandas as pd
from datetime import datetime

# Create test data
df = pd.DataFrame({
    "name": ["place1", "place1", "place1", "place2", "place2", "place2"] * 2,
    "date": ["1/1/2016", "1/2/2016", "1/3/2016"] * 2 + ["1/1/2017", "1/2/2017", "1/3/2017"] * 2,
    "snow": [10.0, 20.0, 30.0, 100.0, 200.0, 300.0, 0.0, 1.0, 2.0, 1000.0, 2000.0, 3000.0]
    })
# Transform string date to datetime format
df["date"] = pd.to_datetime(df["date"])

df looks like: df看起来像:

         date    name    snow
0  2016-01-01  place1    10.0
1  2016-01-02  place1    20.0
2  2016-01-03  place1    30.0
3  2016-01-01  place2   100.0
4  2016-01-02  place2   200.0
5  2016-01-03  place2   300.0
6  2017-01-01  place1     0.0
7  2017-01-02  place1     1.0
8  2017-01-03  place1     2.0
9  2017-01-01  place2  1000.0
10 2017-01-02  place2  2000.0
11 2017-01-03  place2  3000.0

Then calculate the average amount of snow for a certain group, in this case, I've grouped by name and year: 然后计算特定组的平均降雪量,在这种情况下,我按名称和年份分组:

# Group dataset by name of place and year, calculate the average amount of snow for each group
df["snow_average"] = df.groupby(["name", df.date.dt.year])["snow"].transform("mean")

df now looks like: df现在看起来像:

         date    name    snow  snow_average
0  2016-01-01  place1    10.0          20.0
1  2016-01-02  place1    20.0          20.0
2  2016-01-03  place1    30.0          20.0
3  2016-01-01  place2   100.0         200.0
4  2016-01-02  place2   200.0         200.0
5  2016-01-03  place2   300.0         200.0
6  2017-01-01  place1     0.0           1.0
7  2017-01-02  place1     1.0           1.0
8  2017-01-03  place1     2.0           1.0
9  2017-01-01  place2  1000.0        2000.0
10 2017-01-02  place2  2000.0        2000.0
11 2017-01-03  place2  3000.0        2000.0

You can change your groupby() criteria according to what you're looking for. 您可以根据需要更改groupby()标准。 I used name and year because that's what it looks like you want, judging by your example. 我使用name和年份,因为根据您的示例判断,这就是您想要的样子。

EDIT: My apologies, I got frustrated. 编辑:我很抱歉,我感到沮丧。 You seem to have misunderstood my answer, and that you needed to create a new dataframe manually. 您似乎误解了我的答案,并且需要手动创建一个新的数据框。 You can use your dataset from the csv file, and do the same commands as above, but apply them to your dataset, which you call data : 您可以使用csv文件中的数据集,并执行与上述相同的命令,但是将它们应用于数据集,即data

import pandas
data = pandas.read_csv('filteredData.csv')

# Transform string date to datetime format
data["date"] = pd.to_datetime(data["date"])
print data

# Group dataset by name of place and year, calculate the average amount of snow for each group
data["snow_average"] = data.groupby(["name", data.date.dt.year])["snow"].transform("mean")
print data

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM