简体   繁体   English

Python Pandas Groupby列表列表

[英]Python Pandas Groupby a List of Lists

I am new to Python and am trying to combine the functionality that I have created in two separate programs that are working for me. 我是Python的新手,正在尝试将在两个对我有用的单独程序中创建的功能组合在一起。

The goal is to group values by various descriptions and then average values of the data set by date. 目标是按各种说明对值进行分组,然后按日期对数据集的平均值进行分组。 I have successfully done this using Pandas Groupby. 我已经使用Pandas Groupby成功地做到了这一点。

One of the descriptions I would like to evaluate is averaging within a given distance of each point in the data set. 我要评估的描述之一是在数据集中每个点的给定距离内求平均值。 I have been approximating this so far using the zip code as a location description. 到目前为止,我一直使用邮政编码作为位置描述来对此进行近似。 Separately, I have been able to use Geopy to determine all other points in the data set that are within a desired distance using GPS points. 另外,我已经能够使用Geopy使用GPS点确定数据集中所需距离内的所有其他点。 This gives me a list of IDs for each ID in the dataset within a desired distance. 这为我提供了所需距离内数据集中每个ID的ID列表。

Here is an example dataset: 这是一个示例数据集:

ID  Date    Value   Color  Location
1    1      1234    Red    60941
1    2      51461   Red    60941
1    3      6512    Red    60941
1    4      5123    Red    60941
1    5      48413   Red    60941
2    1      5416    Blue   60941
2    2      32      Blue   60941
2    3      18941   Blue   60941
2    4      5135    Blue   60941
2    5      1238    Blue   60941
3    1      651651  Blue   60450
3    2      1777    Blue   60450
3    3      1651    Blue   60450
3    4      1968    Blue   60450
3    5      846     Blue   60450
4    1      1689    Red    60941
4    2      1651    Red    60941
4    3      184     Red    60941
4    4      19813   Red    60941
4    5      132     Red    60941
5    1      354     Yellow 60450
5    2      684     Yellow 60450
5    3      489     Yellow 60450
5    4      354     Yellow 60450
5    5      846     Yellow 60450

This is the Pandas code that I've currently got working using the zip code location description: 这是我目前使用邮政编码位置描述进行的Pandas代码:

average_df = data_df['Value'].groupby([data_df['Location'],data_df['Color'],data_df['Date']]).mean()

Is there a way to pass the list obtained from Geopy into Groupby in place of the ['Location'] group I currently have? 有没有办法将从Geopy获得的列表传递到Groupby来代替我目前拥有的['Location']组? For example, Groupby List(ID) [List 1: (1,2,3), List 2: (3,1,5), List 3:(2,3,4)] then color and date. 例如,Groupby List(ID)[列表1:(1,2,3),列表2:(3,1,5),列表3:(2,3,4)]然后是颜色和日期。

I've gone through the Pandas documentation and searched this website and haven't found anyone using a list in Pandas Groupby so I'm not sure it's possible. 我浏览了Pandas文档并搜索了该网站,但在Pandas Groupby中找不到任何使用列表的人,因此我不确定是否有可能。 Maybe I need to do this in a numpy array? 也许我需要在一个numpy数组中执行此操作? Any feedback would be appreciated. 对于任何反馈,我们都表示感谢。

Pandas will easily groupby a boolean list. 熊猫将轻松按布尔列表进行分组。 Thus, all you need to do is get a list of if each row is nearby or not. 因此,您所需要做的就是获取每行是否在附近的列表。 The easiest way to do this is with a list comprehension: 最简单的方法是使用列表理解:

df = pandas.DataFrame({'value': [3,2,3,6,4,1], 'location': ['a', 'a', 'b', 'c', 'c', 'c']})
nearby_locations = ['a','b']
is_nearby = [(loc in nearby_locations) for loc in df['location']]  
# is_nearby = [True, True, True, False, False, False]
df.groupby(is_nearby).mean()

This will output: 这将输出:

          value
False  3.666667
True   2.666667

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM