简体   繁体   中英

Eliminate keys from list of dict python

i am pulling out information from this websites API: https://financialmodelingprep.com/

to be specific i need the data from the income statements:

https://financialmodelingprep.com/developer/docs/#Company-Financial-Statements

what i get back from the API is a list, which contains 36 dictionarys with the following Data:

[ {
  "date" : "2019-09-28",
  "symbol" : "AAPL",
  "fillingDate" : "2019-10-31 00:00:00",
  "acceptedDate" : "2019-10-30 18:12:36",
  "period" : "FY",
  "revenue" : 260174000000,
  "costOfRevenue" : 161782000000,
  "grossProfit" : 98392000000,
  "grossProfitRatio" : 0.378178,
  "researchAndDevelopmentExpenses" : 16217000000,
  "generalAndAdministrativeExpenses" : 18245000000,
  "sellingAndMarketingExpenses" : 0.0,
  "otherExpenses" : 1807000000,
  "operatingExpenses" : 34462000000,
  "costAndExpenses" : 196244000000,
  "interestExpense" : 3576000000,
  "depreciationAndAmortization" : 12547000000,
  "ebitda" : 81860000000,
  "ebitdaratio" : 0.314636,
  "operatingIncome" : 63930000000,
  "operatingIncomeRatio" : 0.24572,
  "totalOtherIncomeExpensesNet" : 422000000,
  "incomeBeforeTax" : 65737000000,
  "incomeBeforeTaxRatio" : 0.252666,
  "incomeTaxExpense" : 10481000000,
  "netIncome" : 55256000000,
  "netIncomeRatio" : 0.212381,
  "eps" : 2.97145,
  "epsdiluted" : 2.97145,
  "weightedAverageShsOut" : 18595652000,
  "weightedAverageShsOutDil" : 18595652000,
  "link" : "https://www.sec.gov/Archives/edgar/data/320193/000032019319000119/0000320193-19-000119-index.html",
  "finalLink" : "https://www.sec.gov/Archives/edgar/data/320193/000032019319000119/a10-k20199282019.htm"
}, ...
]

What i dont need in the dictionary are the keys: fillingDate, acceptedDate, link, finalLink

I managed to remove them, but my problem is that now that piece of code i wrote spits out those dictionaries way too often, and i am not able to understand why...

Here is what i tried:

import requests
import json

url = "https://financialmodelingprep.com/api/v3/income-statement/AAPL?apikey=b60bb3d1967bb15bfb9daaa4426e77dc"
response = requests.get(url)
data = response.text
dataList = json.loads(data)
entriesToRemove = {
    'fillingDate' : 0,
    'acceptedDate' : 0,
    'link' : 0,
    'finalLink' : 0
}
removedEntries = []
newDict = {}

for index in range(len(dataList)):
    for key in dataList[index]:
        newDict[key] = dataList[index].get(key)                 
        if key in entriesToRemove:                              
            removedEntries = newDict.pop(key)                   
        print(json.dumps(newDict, indent=4))

Thanks in advance

OP:

for each key in the dictionary, the dictionary gets printed a new time.

Reason:

for index in range(len(dataList)):
for key in dataList[index]:
    newDict[key] = dataList[index].get(key)                 
    if key in entriesToRemove:                              
        removedEntries = newDict.pop(key)                   
    print(json.dumps(newDict, indent=4))    # notice this line

The reason why the dictionary is printed for each key is because you have a print(json.dumps(newDict, indent=4)) statement inside the loop for each key-val iteration over the dictionary.

To eradicate the highlighted keys from a list of dict, you could iterate over the list and create another list of dict without the unnecessary keys:

s = [ {
  "date" : "2019-09-28",
  "symbol" : "AAPL",
  "fillingDate" : "2019-10-31 00:00:00",
  "acceptedDate" : "2019-10-30 18:12:36",
  "period" : "FY",
  "revenue" : 260174000000,
  "costOfRevenue" : 161782000000,
  "grossProfit" : 98392000000,
  "grossProfitRatio" : 0.378178,
  "researchAndDevelopmentExpenses" : 16217000000,
  "generalAndAdministrativeExpenses" : 18245000000,
  "sellingAndMarketingExpenses" : 0.0,
  "otherExpenses" : 1807000000,
  "operatingExpenses" : 34462000000,
  "costAndExpenses" : 196244000000,
  "interestExpense" : 3576000000,
  "depreciationAndAmortization" : 12547000000,
  "ebitda" : 81860000000,
  "ebitdaratio" : 0.314636,
  "operatingIncome" : 63930000000,
  "operatingIncomeRatio" : 0.24572,
  "totalOtherIncomeExpensesNet" : 422000000,
  "incomeBeforeTax" : 65737000000,
  "incomeBeforeTaxRatio" : 0.252666,
  "incomeTaxExpense" : 10481000000,
  "netIncome" : 55256000000,
  "netIncomeRatio" : 0.212381,
  "eps" : 2.97145,
  "epsdiluted" : 2.97145,
  "weightedAverageShsOut" : 18595652000,
  "weightedAverageShsOutDil" : 18595652000,
  "link" : "https://www.sec.gov/Archives/edgar/data/320193/000032019319000119/0000320193-19-000119-index.html",
  "finalLink" : "https://www.sec.gov/Archives/edgar/data/320193/000032019319000119/a10-k20199282019.htm"
}
]


res = []
ignored_keys = ['fillingDate', 'acceptedDate', 'link', 'finalLink']
for dd in s:
    for k,v in dd.items():
        if k not in ignored_keys:
            res.append({k: v})      
print(res)

EDIT :

one-liner:

print({k:v for dd in s for k,v in dd.items() if k not in ignored_keys})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM