简体   繁体   中英

How to filter multiple dataframes and append a string to the save filenames?

  • The reason I'm trying to accomplish this is to use lots of variable names to create lots of new variable names containing the names of the original variables.
  • For example, I have several pandas data frames of inventory items in each location.
    • I want to create new data frames containing only the the negative inventory items with '_neg' appended to the original variable names (inventory locations).
    • I want to be able to do this with a for loop something like this:
warehouse = pd.read_excel('warehouse.xls')
retail = pd.read_excel('retailonhand.xls')
shed3 = pd.read_excel('shed3onhand.xls')
tank1 = pd.read_excel('tank1onhand.xls')
tank2 = pd.read_excel('tank2onhand.xls')

all_stock_sites = [warehouse,retail,shed3,tank1,tank2]

all_neg_stock_sites = []
for site in all_stock_sites:
    string_value_of_new_site = (pseudo code):'site-->string_value_of_site' + '_neg'
    string_value_of_new_site = site[site.OnHand < 0]
    all_neg_stock_sites.append(string_value_of_new_site)
  • to create something like this
# create new dataframes for each stock site's negative 'OnHand' values
warehouse_neg = warehouse[warehouse.OnHand < 0]
retail_neg = retail[retail.OnHand < 0]
shed3_neg = shed3[shed3.OnHand < 0]
tank1_neg = tank1[tank1.OnHand < 0]
tank2_neg = tank2[tank2.OnHand < 0]
  • Without having to type out all 500 different stock site locations and appending '_neg' by hand.

My recommendation would be to not use variable names as the "keys" to the data, but rather assign them proper names, in a tuple or dict.

So instead of:

warehouse = pd.read_excel('warehouse.xls')
retail = pd.read_excel('retailonhand.xls')
shed3 = pd.read_excel('shed3onhand.xls')

You would have:

sites = {}
sites['warehouse'] = pd.read_excel('warehouse.xls')
sites['retail'] = pd.read_excel('retailonhand.xls')
sites['shed3'] = pd.read_excel('shed3onhand.xls')
...etc

Then you could create the negative keys like so:

sites_neg = {}
for site_name, site in sites.items():
  neg_key = site_name + '_neg'
  sites_neg[neg_key] = site[site.OnHand < 0]
from pathlib import Path
import pandas as pd

# set path to top file directory
d = Path(r'e:\PythonProjects\stack_overflow\stock_sites')

# get all xls files
files = list(d.rglob('*.xls'))

# create, filter and save dict of dataframe
df_dict = dict()
for file in files:
    # create dataframe
    df = pd.read_excel(file)
    try:
        # filter df and add to dict
        df = df[df.OnHand < 0]
    except AttributeError as e:
        print(f'{file} caused:\n{e}\n')
        continue 
    if not df.empty:
        df_dict[f'{file.stem}_neg'] = df
        # save to new file
        new_path = file.parent / f'{file.stem}_neg{file.suffix}'
        df.to_excel(new_path, index=False)

print(df_dict.keys())

>>> dict_keys(['retailonhand_neg', 'shed3onhand_neg', 'tank1onhand_neg', 'tank2onhand_neg', 'warehouse_neg'])

# access individual dataframes as you would any dict
df_dict['retailonhand_neg']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM