Would be great if you could help a python beginner, thx for reading!
I want to analyze a textdocument which is formated like this and has a large amount of lines like this:
000001 A040C015_130223_R1WV V C 11:37:48:22 11:38:29:18 10:00:53:00 10:01:33:20
between every string there are whitespaces. So I did following:
#writing data into list
datalist = []
filedata = open(inputfile, 'r')
for line in filedata:
line = line.strip('\n\t\r')
datalist.append(line)
filedata.close()
#splitting up List by whitespace and creating new List
newList = []
for i in datalist:
newList.append(i.split(' '))
print newList[0:]
#parsing list with regex
regCompiled = re.compile('^[A-Z][0-9]{3,3}[C][0-9]{3,3}[_][0-9]{6,6}[_][A-Z][0-9]{2,2}[A-Z].*');
for content in newList:
checkMatch = re.match(regCompiled, content);
if checkMatch:
print ("Found:"), content
else:
print ("NO Match")
First problem I have is, that it seems it makes for every line a list with empty ('') items for every whitespace after splitting, and then it seems like it is a list in a list because of the split function.
i tried with
filter(None, newList)
but the ('') items are remaining and an error with regex because of empty items. After all I want extract only the strings containing the A040C015_etc.
The full textlist is here: Link to full Text Document
Thank you very much for any help... rainer
try using split()
instead of split(" ")
. that should take care of the extra space:
>>> i = "x X"
>>> i.split()
['x', 'X']
>>> i.split(" ")
['x', '', 'X']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.