简体   繁体   中英

Most Efficient Way to iteratively filter a Pandas dataframe given a list of values

I was hoping someone could point me in the right direction...

I have one dataset containing market data with columns: Area, Price per night and # of Bedrooms.

I want to create another dataset which shows how many bedrooms are available, at each price point in a given list, for each area.

I am currently using two for loops: one to loop through the prices and then another to loop through the areas (a loop within a loop). It then filters the market data and sums up the bedrooms column. This is an extremely slow process - especially when my price list is 1000 of entries long and I have dozens of areas.

How could I speed up this process? Example code attached below.

import random
import pandas as pd

name_choices = ['North', 'South', 'East', 'West']
bedroom_choices = [1,2,3,4,5]
price_choices = list(range(5, 300))

name_list = []
bedrooms_list = []
price_list = []

for i in range(100):
    
    name_list.append(random.choice(name_choices))
    bedrooms_list.append(random.choice(bedroom_choices))
    price_list.append(random.choice(price_choices))

market_data_ex  = pd.DataFrame(data = {'Area' : name_list, 'Bedrooms' : bedrooms_list, 'Price': price_list})

empty_area = []
empty_price = []
empty_bedrooms = []

for area in name_choices:
    
    for price in price_choices:
        
        bedrooms_available = market_data_ex[(market_data_ex['Area'] == area) & (market_data_ex['Price'] <= price)]['Bedrooms'].sum()
        
        empty_area.append(area)
        empty_price.append(price)
        empty_bedrooms.append(bedrooms_available)
        
pd.DataFrame(data = {'Area' : empty_area, 'Price' : empty_price, 'Bedrooms' : empty_bedrooms})

Many thanks in advance!!!

If I understand you correctly, you can do .pivot_table your data and then do .cumsum() row-wise ( axis=1 ):

x = (
    market_data_ex.pivot_table(
        index="Area",
        columns="Price",
        values="Bedrooms",
        aggfunc="sum",
        fill_value=0,
    )
    .cumsum(axis=1)
    .stack()
    .reset_index()
    .rename(columns={0: "Bedrooms"})
)
print(x)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM