I running into bit of an issue and I'm hoping the stackoverflow crew would be able to help.
I keep getting the error message Object of type 'NoneType' has no len() whenever I attempt to separate a class of documents.
The full trace-back is:
C:\>C:\Python27\python.exe C:\Testing\test.py "C:\Testing\IN" "C:\Testing\Outputs" "C:\Testing\Test.csv"
C:\Testing\IN\000001.000001.xml
Traceback (most recent call last):
File "C:\Testing\test.py", line 46, in <module>
if len(documentclass)==0:
TypeError: object of type 'NoneType' has no len()
C:\>
Here's the code:
import csv, sys, os
import shutil
import xml.etree.ElementTree as ET
if __name__ == '__main__':
if not (len(sys.argv) == 4):
print 'USAGE: %s inFolder OutFolder csvFile' % (sys.argv[0])
else:
inFolder = sys.argv[1]
outFolder = sys.argv[2]
className = sys.argv[3]
count = 0
for fileName in os.listdir(inFolder):
if fileName.endswith(".pdf"):
baseName = fileName.split('.pdf')[0]
pdfFile = inFolder+"\\"+baseName+".pdf"
xmlFile = inFolder+"\\"+baseName+".xml"
validatedXmlFile = inFolder+"\\"+baseName+".xml.validated.xml"
xmlSize = os.path.getsize(xmlFile)
pdfSize = os.path.getsize(pdfFile)
if xmlSize>0 and pdfSize>0:
print
print xmlFile
count = count + 1
tree = ET.parse(xmlFile)
root_xml = tree.getroot()
form_xml = root_xml[0]
#form_xml = root_xml[1]
documentclass_xml = form_xml.find('DocumentClassGlobal')
documentclassLocal_xml = form_xml.find('DocumentClassLocal')
#documentclass_xml = form_xml.find('SSMClassID')
if documentclass_xml is not None:
documentclass = documentclass_xml.find('data').text
elif documentclassLocal_xml is not None:
documentclass = documentclassLocal_xml.find('data').text
documentclass = documentclass + "_Local"
else:
documentclass = ""
if len(documentclass)==0:
documentclass = "UNKNOWN"
print documentclass
if documentclass == className:
if not os.path.exists(outFolder + "\\" + documentclass):
os.makedirs(outFolder + "\\" + documentclass)
inBaseFile = inFolder + "\\"+baseName
outBaseFile = outFolder + "\\" + documentclass+"\\"+baseName
inFile = inBaseFile+".pdf"
outFile = outBaseFile+".pdf"
print inFile
print outFile
shutil.copy(inBaseFile+".pdf", outBaseFile+".pdf")
shutil.copy(inBaseFile+".pdf.conf.xml", outBaseFile+".pdf.conf.xml")
shutil.copy(inBaseFile+".pdf.multi.txt", outBaseFile+".pdf.multi.txt")
shutil.copy(inBaseFile+".pdf.txt", outBaseFile+".pdf.txt")
#shutil.move(inBaseFile+".wdb", outBaseFile+".wdb")
shutil.copy(inBaseFile+".xml", outBaseFile+".xml")
if os.path.exists(inBaseFile+".xml.validated.xml"):
shutil.copy(inBaseFile+".xml.validated.xml", outBaseFile+".xml.validated.xml")
if os.path.exists(inBaseFile+".xml.validationinfo.xml"):
shutil.copy(inBaseFile+".xml.validationinfo.xml", outBaseFile+".xml.validationinfo.xml")
print '%d files found and copied.' % (count)
Obviously, the if len(documentclass)==
0 is returning the None value. The idea is to assign the None value and the 0 value to documentclass - "Unknown"
Thus far, I have come up with the below, but with no success. Any ideas?
Many Thanks
if documentclass_xml is not None:
documentclass = documentclass_xml.find('data').text
elif documentclassLocal_xml is not None:
documentclass = documentclassLocal_xml.find('data').text
documentclass = documentclass + "_Local"
if documentclass_xml is None:
documentclass = "UNKNOWN"
elif documentclassLocal_xml is None:
documentclass = "UNKNOWN"
else:
documentclass = ""
if len(documentclass)==0:
documentclass = "UNKNOWN"
print documentclass
The traceback tells you that documentclass
has the value None
. You initialized it with:
if documentclass_xml is not None:
documentclass = documentclass_xml.find('data').text
elif documentclassLocal_xml is not None:
documentclass = documentclassLocal_xml.find('data').text
documentclass = documentclass + "_Local"
else:
documentclass = ""
so at least one branch of the if
must assign None
to it. Accessing the text
attribute of an ElementTree
node will return None
if there is no text content in the node. It has to be the first branch of the condition otherwise the attempt to append "_Local"
would throw the error.
>>> data = ET.fromstring('<test/>')
>>> data.text is None
True
Therefore you are accessing an empty <data/>
node.
Turns out there were some changes made to DocumentClassGlobal and DocumentClassLocal. Before, these objects were created only if there was a value in documentclass_xml.find('data').text. Now, this rule is ignored and documentclass_xml.find('data').text can be a None value. I made the below adjustment and it did the trick. Thanks mgilson for pointing it out.
if documentclass_xml is not None and documentclass_xml.find('data').text is not None:
documentclass = documentclass_xml.find('data').text
elif documentclassLocal_xml is not None and documentclass_xml.find('data').text is not None:
documentclass = documentclassLocal_xml.find('data').text
documentclass = documentclass + "_Local"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.