简体   繁体   中英

retrieve values from dataframe using keys in dictionary

I am trying to filter stock ticker symbols by their industry. I can't find a way to use the dictionary I created to bring in all of the ticker symbols. How can I iterate through the keys in my dictionary to bring in the stock symbols in their respective list? I am relatively new to python and I am sure there is a relatively easy way, I just can't find it.

My dataframe looks like this:

Symbol      industry
TXG         Biotechnology
YI          Medical
PIH         Property Insurers
PIHPP       Property Insurers

except there are thousands more rows.

# I'm bringing in the values from the column 'industry' and create a dictionary:

industries_var = all_tickers['industry'].values
industries = {industry_name: [] for industry_name in industries_var}

# now I want to iterate through the name of every list in my dictionary 
# and append the matching symbol to the industry name in the dataframe:

for key in industries:
    if all_tickers['industry'].str.contains(key, na=False).any():
        industries.append(all_tickers['Symbol'].values)

I am getting the error code: AttributeError: 'dict' object has no attribute 'append'

I am expecting a dictionary looking something like this:

industries = {Biotechnology: ['TXG']
              Medical: ['YI']
              Property Insurers: ['PIH', 'PIHPP']}

I know you can manually type in every industry in the dataframe to filter every list individually, but because there are thousands of lines of data I am looking for an iteration like mine above, just a working one.

Thank you!

A similar question has most likely been asked before, but I believe this solution would solve your problem.

Populate a dictionary with each industry and the Symbols that are within it:

industries = {}
for industry in df.industry.unique():
    industries[industry] = df.loc[df.industry == industry].Symbol.unique()

The for loop iterates through each unique industry in your DataFrame. It then uses those industries as keys to the dictionary, and assigns an array to each key that contain the Symbols assigned to that industry.

You'll need two concepts to do what you want: 1) a Python defaultdict 2) Pandas/numpy conditional boolean masks. Here's a worked example using your DataFrame:

import pandas as pd
from collections import defaultdict
all_tickers = pd.DataFrame({'Symbol': ['TXG', 'YI', 'PIH', 'PIHPP'], 'industry': ['Biotechnology', 'Medical', 'Property Insurers', 'Property Insurers']})

industries_var = set(all_tickers['industry'].values)
industries = defaultdict(list)

for k in industries_var:
    industries[k].append(all_tickers[all_tickers.industry == k]['Symbol'].unique())

industries = dict(industries)

Note also you don't need to convert back to a normal dict at the end like I did; the normal dict and defaultdict will operate identically, but the normal dict is a bit nicer to look at if you want to print to screen for any reason.

Finally, this is a really comprehensive discussion on defaultdicts: How does collections.defaultdict work?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM