简体   繁体   中英

Python Pandas Groupby a List of Lists

I am new to Python and am trying to combine the functionality that I have created in two separate programs that are working for me.

The goal is to group values by various descriptions and then average values of the data set by date. I have successfully done this using Pandas Groupby.

One of the descriptions I would like to evaluate is averaging within a given distance of each point in the data set. I have been approximating this so far using the zip code as a location description. Separately, I have been able to use Geopy to determine all other points in the data set that are within a desired distance using GPS points. This gives me a list of IDs for each ID in the dataset within a desired distance.

Here is an example dataset:

ID  Date    Value   Color  Location
1    1      1234    Red    60941
1    2      51461   Red    60941
1    3      6512    Red    60941
1    4      5123    Red    60941
1    5      48413   Red    60941
2    1      5416    Blue   60941
2    2      32      Blue   60941
2    3      18941   Blue   60941
2    4      5135    Blue   60941
2    5      1238    Blue   60941
3    1      651651  Blue   60450
3    2      1777    Blue   60450
3    3      1651    Blue   60450
3    4      1968    Blue   60450
3    5      846     Blue   60450
4    1      1689    Red    60941
4    2      1651    Red    60941
4    3      184     Red    60941
4    4      19813   Red    60941
4    5      132     Red    60941
5    1      354     Yellow 60450
5    2      684     Yellow 60450
5    3      489     Yellow 60450
5    4      354     Yellow 60450
5    5      846     Yellow 60450

This is the Pandas code that I've currently got working using the zip code location description:

average_df = data_df['Value'].groupby([data_df['Location'],data_df['Color'],data_df['Date']]).mean()

Is there a way to pass the list obtained from Geopy into Groupby in place of the ['Location'] group I currently have? For example, Groupby List(ID) [List 1: (1,2,3), List 2: (3,1,5), List 3:(2,3,4)] then color and date.

I've gone through the Pandas documentation and searched this website and haven't found anyone using a list in Pandas Groupby so I'm not sure it's possible. Maybe I need to do this in a numpy array? Any feedback would be appreciated.

Pandas will easily groupby a boolean list. Thus, all you need to do is get a list of if each row is nearby or not. The easiest way to do this is with a list comprehension:

df = pandas.DataFrame({'value': [3,2,3,6,4,1], 'location': ['a', 'a', 'b', 'c', 'c', 'c']})
nearby_locations = ['a','b']
is_nearby = [(loc in nearby_locations) for loc in df['location']]  
# is_nearby = [True, True, True, False, False, False]
df.groupby(is_nearby).mean()

This will output:

          value
False  3.666667
True   2.666667

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM