简体   繁体   中英

TypeError: Byte-like object, not string

I have this code, but keep running into versions of the title error. Can anyone help me get past these? Traceback hits on the newfilingDate line (4th from bottom), but I suspect that's not where the actual error is?

def getIndexLink(tickerCode,FormType):
    csvOutput = open(IndexLinksFile,"a+b") # "a+b" indicates that we are adding lines rather than replacing lines
    csvWriter = csv.writer(csvOutput, quoting = csv.QUOTE_NONNUMERIC)

    urlLink = "https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK="+tickerCode+"&type="+FormType+"&dateb=&owner=exclude&count=100"
    pageRequest = urllib.request.Request(urlLink)
    with urllib.request.urlopen(pageRequest) as url:
        pageRead = url.read()

    soup = BeautifulSoup(pageRead,"html.parser")

    #Check if there is a table to extract / code exists in edgar database
    try:
        table = soup.find("table", { "class" : "tableFile2" })
    except:
        print("No tables found or no matching ticker symbol for ticker symbol for"+tickerCode)
        return -1

    docIndex = 1
    for row in table.findAll("tr"):
        cells = row.findAll("td")
        if len(cells)==5:
            if cells[0].text.strip() == FormType:
                link = cells[1].find("a",{"id": "documentsbutton"})
                docLink = "https://www.sec.gov"+link['href']
                description = cells[2].text.encode('utf8').strip() #strip take care of the space in the beginning and the end
                filingDate = cells[3].text.encode('utf8').strip()
                newfilingDate = filingDate.replace("-","_")  ### <=== Change date format from 2012-1-1 to 2012_1_1 so it can be used as part of 10-K file names
                csvWriter.writerow([tickerCode, docIndex, docLink, description, filingDate,newfilingDate])
                docIndex = docIndex + 1
    csvOutput.close()

byte-like objects can have.replace called on it so long as the replace args are also byte-like. (special thanks to juanpa.arrivillaga for pointing this out)

foo = b'hi-mom'
foo = foo.replace(b"-", b"_")
print(foo)

Alternatively, you can recast to a string and then back to byte-like but that is messy and inefficient.

foo = b'hi-mom'
foo = str(foo).replace("-","_").encode('utf-8')
print(foo)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM