How to filter multiple dataframes and append a string to the save filenames?

Question

The reason I'm trying to accomplish this is to use lots of variable names to create lots of new variable names containing the names of the original variables.
For example, I have several pandas data frames of inventory items in each location.
- I want to create new data frames containing only the the negative inventory items with '_neg' appended to the original variable names (inventory locations).
- I want to be able to do this with a for loop something like this:

warehouse = pd.read_excel('warehouse.xls')
retail = pd.read_excel('retailonhand.xls')
shed3 = pd.read_excel('shed3onhand.xls')
tank1 = pd.read_excel('tank1onhand.xls')
tank2 = pd.read_excel('tank2onhand.xls')

all_stock_sites = [warehouse,retail,shed3,tank1,tank2]

all_neg_stock_sites = []
for site in all_stock_sites:
    string_value_of_new_site = (pseudo code):'site-->string_value_of_site' + '_neg'
    string_value_of_new_site = site[site.OnHand < 0]
    all_neg_stock_sites.append(string_value_of_new_site)

to create something like this

# create new dataframes for each stock site's negative 'OnHand' values
warehouse_neg = warehouse[warehouse.OnHand < 0]
retail_neg = retail[retail.OnHand < 0]
shed3_neg = shed3[shed3.OnHand < 0]
tank1_neg = tank1[tank1.OnHand < 0]
tank2_neg = tank2[tank2.OnHand < 0]

Without having to type out all 500 different stock site locations and appending '_neg' by hand.

Answer 1

My recommendation would be to not use variable names as the "keys" to the data, but rather assign them proper names, in a tuple or dict.

So instead of:

warehouse = pd.read_excel('warehouse.xls')
retail = pd.read_excel('retailonhand.xls')
shed3 = pd.read_excel('shed3onhand.xls')

You would have:

sites = {}
sites['warehouse'] = pd.read_excel('warehouse.xls')
sites['retail'] = pd.read_excel('retailonhand.xls')
sites['shed3'] = pd.read_excel('shed3onhand.xls')
...etc

Then you could create the negative keys like so:

sites_neg = {}
for site_name, site in sites.items():
  neg_key = site_name + '_neg'
  sites_neg[neg_key] = site[site.OnHand < 0]

Answer 2

Use rglob from the pathlib module to create a list of existing files
Use f-strings to update the file names
- PEP 498 - Literal String Interpolation
Iterate through each file:
1. Create a dataframe
2. Filter the dataframe. An error will occur if the column doesn't exist (eg AttributeError: 'DataFrame' object has no attribute 'OnHand' ), so we put the code in a try-except block. The continue statement, continues with the next iteration of the loop.
3. Check that the dataframe is not empty . If it's not empty then...
4. Add the dataframe to a dictionary for additional processing, if desired.
5. Save the dataframe as a new file with _neg added to the file name

from pathlib import Path
import pandas as pd

# set path to top file directory
d = Path(r'e:\PythonProjects\stack_overflow\stock_sites')

# get all xls files
files = list(d.rglob('*.xls'))

# create, filter and save dict of dataframe
df_dict = dict()
for file in files:
    # create dataframe
    df = pd.read_excel(file)
    try:
        # filter df and add to dict
        df = df[df.OnHand < 0]
    except AttributeError as e:
        print(f'{file} caused:\n{e}\n')
        continue 
    if not df.empty:
        df_dict[f'{file.stem}_neg'] = df
        # save to new file
        new_path = file.parent / f'{file.stem}_neg{file.suffix}'
        df.to_excel(new_path, index=False)

print(df_dict.keys())

>>> dict_keys(['retailonhand_neg', 'shed3onhand_neg', 'tank1onhand_neg', 'tank2onhand_neg', 'warehouse_neg'])

# access individual dataframes as you would any dict
df_dict['retailonhand_neg']

How to filter multiple dataframes and append a string to the save filenames?

Question

2 answers

solution1
1 2020-05-20 20:20:45

solution2
0 ACCPTED 2020-05-20 21:02:48

How to filter multiple dataframes and append a string to the save filenames?

Question

2 answers

solution1 1 2020-05-20 20:20:45

solution2 0 ACCPTED 2020-05-20 21:02:48

solution1
1 2020-05-20 20:20:45

solution2
0 ACCPTED 2020-05-20 21:02:48