I am downloading 15Y of data (daily close) for 5 stocks ('A','AAP','AAPL','ABBV','ABC'). The issue is that I got some repetitions. No issue for the first one,'A', I got the right amount of data. For the second one,'AAP', I have twice the right number of rows, it seems the data were downloaded twice. Same issue for the last 3 stocks for which I have three times the right number of rows. I have attached a screenshot showing the size of the csv files, these files should have the same size if everything was fine.
I suspect that the issue comes from the 10 seconds pause after calling reqHistoricalData; it may be too long. How could I avoid having duplicated rows and how to pause the right amount of time (not too long and not too short)?
import pandas as pd
import datetime as dt
import time
import collections
import threading
import os
from ibapi.client import EClient
from ibapi.wrapper import EWrapper
from ibapi.contract import Contract
from ibapi.common import BarData
path = r"D:\trading\data\debug\\"
class IBapi(EWrapper, EClient):
def __init__(self):
EClient.__init__(self, self)
self.data=collections.defaultdict(list)
def nextValidId(self, orderId: int):
super().nextValidId(orderId)
self.nextorderId = orderId
print('The next valid order id is: ', self.nextorderId)
def error(self, reqId, errorCode, errorString):
super().error(reqId, errorCode, errorString)
print("Error. Id:", reqId, "Code:", errorCode, "Msg:", errorString)
def historicalData(self, reqId:int, bar:BarData):
self.data["date"].append(bar.date)
self.data["close"].append(bar.close)
self.df = pd.DataFrame.from_dict(self.data)
tickers = ["A","AAP","AAPL","ABBV","ABC"]
def run_loop():
app.run()
app = IBapi()
app.connect("127.0.0.1", 7496, 5)
app.nextorderId = None
# Start the socket in a thread
api_thread = threading.Thread(target=run_loop, daemon=True)
api_thread.start()
# Check if the API is connected via orderid
while True:
if isinstance(app.nextorderId, int):
print('connected')
break
else:
print('waiting for connection')
time.sleep(1)
n_id = app.nextorderId
for ticker in tickers:
contract = Contract()
contract.symbol = ticker
contract.secType = "STK"
contract.exchange = "SMART"
contract.currency = "USD"
app.reqHistoricalData(n_id, contract, "","15 Y", "1 day", "TRADES", 1, 1, False, [])
time.sleep(10)
app.df.to_csv(path + ticker + ".csv")
n_id = n_id + 1
app.disconnect()
You don't clear the list in between requests.
def historicalData(self, reqId:int, bar:BarData):
# just keeps adding data to list
self.data["date"].append(bar.date)
self.data["close"].append(bar.close)
# makes a new dataframe on every single bar
self.df = pd.DataFrame.from_dict(self.data)
In the historicalDataEnd
method you can make a dataframe and save it to a file. Make a dict of tickers and reqId's so you know which ticker is finished.
You should still have a 10 second delay in between calls for pacing but do not count on data being returned within 10 seconds. If it doesn't arrive, you will get an empty file (or in your case, all the previous tickers data, which seems to have happened with ABC).
Your duplicates come every Friday. You make a request for, say Friday (1st iteration) and in the next 2 iterations (which are Saturday and Sunday) the API returns data from the first possible trading day (last Friday). Otherwise 5 seconds is enough time to wait.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.