简体   繁体   中英

Read file and get certain value from each line of file

I'm stuck on a particular point on something, I'm hoping you guys could perhaps suggest a better method.

For each line of file I'm reading I want to get the nth word of the line, store this and print them on a single line.

I have the following code:

import os

p = './output.txt'

word_line = ' '

myfile = open(p, 'r')
for words in myfile.readlines()[1:]: # I remove the first line because I don't want it
    current_word = words.strip().split(' ')[4]
    word_line += current_word
    print word_line
myfile.close()

The file it reads looks like this:

1 abc-abc.abc (1235456) [AS100] bla 123 etc
2 abc-abc.abc (1235456) [AS10] bla 123 etc
3 abc-abc.abc (1235456) [AS1] bla 123 etc
4 abc-abc.abc (1235456) [AS56] bla 123 etc
5 abc-abc.abc (1235456) [AS8] bla 123 etc
6 abc-abc.abc (1235456) [AS200] bla 123 etc
etc

My current code outputs the following:

[AS100][AS10][AS1][AS56][AS8][AS200]

Only problem is, it is not always fixed as the 4th value of the line, as sometimes it appears as 5th, etc or not at all.

I'm currently trying out:

if re.match("[AS", words):
    f_word = re.match(".*[(.*)",words)

This isn't working out, I'm trying to see if in the current line it finds an open "[" If it does to display the content of it before the closing "]. Move on to the new line and keep on doing this.

Eventually have the following desired output:

AS100 AS10 AS1 AS56 AS8 AS200

I could really use some advise on this. Thanks

EDIT:

m = re.search(r'\[AS(.*?)]', words)
if m:
    f_word += ' ' + m.group(1)

Thanks

[ is a special character in regular expressions and denotes the start of a character class. Escape it.

m = re.search(r'\[AS(.*?)]', words)
if m:
    f_word = m.group(1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM