[英]How to extract a word in a string following one that matches something in a key word list
I am a newcomer to Python. 我是Python的新手。 I can split a line of a file up into words, but haven't found out how to get at the word which follows a match to a set of key words. 我可以将文件的一行分割成多个单词,但是还没有找到如何获得与一组关键字匹配的单词的方法。
fread = open (F_FIXED_EERAM, 'r')
KEYWORDS = ['tINT16', 'tUINT16', 'tGDT_TYPE']
for line in fread.readlines():
words = line.split()
for word in words:
if word in KEYWORDS:
# I want to append the word after the keyword to a new string in another file
# How do I get at that word?
...
Just set a boolean to store the next word if a keyword was found: 如果找到关键字,只需设置一个布尔值来存储下一个单词:
KEYWORDS = ['tINT16', 'tUINT16', 'tGDT_TYPE']
result = []
with open (F_FIXED_EERAM, 'r') as fread:
for line in fread:
store_next = False
words = line.split()
for word in words:
if store_next:
result.append(word)
store_next = False
elif word in KEYWORDS:
store_next = True
result
is now a list of all words that where preceded by one of the KEYWORDS
. result
现在是所有单词的列表,这些单词前面有一个KEYWORDS
。
I made the assumption if the last word of the previous line is a keyword, the first word on the next line doesn't have to be stored. 我假设上一行的最后一个单词是关键字,而下一行的第一个单词则不必存储。 If you do want this behaviour move store_next = False
outside the (outer) for
loop. 如果您确实希望这种行为,请将store_next = False
移到for
循环的(外部)外部。
Or you could use a regular expression
: 或者您可以使用regular expression
:
import re
KEYWORDS = ['tINT16', 'tUINT16', 'tGDT_TYPE']
regex = '(?:{}) +(\\w+)'.format('|'.join(map(re.escape, KEYWORDS)))
with open ('in.txt', 'r') as file_:
print(re.findall(regex, file_.read()))
This might look like magic, but this is the actual regular expression used: 这看起来像魔术,但这是实际使用的正则表达式:
(?:tINT16|tUINT16|tGDT_TYPE) +(\w+)
Which translates to: match one of the keywords followed by one or more spaces followed by a word. 转换为:匹配关键字之一,后跟一个或多个空格,后跟一个单词。 ?:
at the beginning tells Python not to store that group. ?:
开头告诉Python不要存储该组。 \\w
is equivalent to [a-zA-Z0-9_]
(depending on LOCALE and UNICODE flags). \\w
等效于[a-zA-Z0-9_]
(取决于LOCALE和UNICODE标志)。
You can either use enumerate(words)
giving you the following 您可以使用enumerate(words)
为您提供以下内容
for i, word in enumerate(words):
if word in KEYWORDS:
if(i+1<len(words)):
str.append(word[i+1])
Or you can use the re
library http://docs.python.org/library/re.html . 或者,您可以使用re
库http://docs.python.org/library/re.html 。 Here you can specify a regular expression an easily parse out specific values straight into an array 在这里,您可以指定正则表达式,轻松地将特定值解析为数组
Maybe the following code is what you want. 也许下面的代码是您想要的。 Please notice that if the keyword appears at the end of line, you need to add some special processing. 请注意,如果关键字出现在行尾,则需要添加一些特殊处理。
newstring = ''
fread = open (F_FIXED_EERAM, 'r')
KEYWORDS = ['tINT16', 'tUINT16', 'tGDT_TYPE']
for line in fread.readlines():
words = line.split()
for i in range(0,len(words)-1):
if words[i] in KEYWORDS:
newstring += words[i+1]
The easiest way to do this is to keep track of the word you saw the last time through the loop. 最简单的方法是跟踪您上次在循环中看到的单词。 If this word is one of your keywords, then the current word is the word following it. 如果该单词是您的关键字之一,那么当前单词就是其后的单词。 It is natural to write this as a generator. 将其写为生成器是很自然的。 It is also convenient to write a generator that returns the individual words (tokens) from a file. 写一个生成器从文件中返回单个单词(标记)也很方便。
def tokens_from(filename):
with open(filename) as f:
for line in f:
for token in line.split():
yield token
def keyword_values(filename, *keywords):
keywords = set(keywords)
previous = None
for token in tokens_from(filename):
if previous in keywords:
yield token
previous = token
Now you can get the words into a list: 现在,您可以将单词放入列表中:
result = list(keyword_values(F_FIXED_EERAM, 'tINT16', 'tUINT16', 'tGDT_TYPE'))
Or you can build up a string: 或者您可以建立一个字符串:
result = " ".join(keyword_values(F_FIXED_EERAM, 'tINT16', 'tUINT16', 'tGDT_TYPE'))
Or you can iterate over them and write them to a file: 或者,您可以遍历它们并将它们写入文件:
with open("outfile.txt", "w") as outfile:
for outword in keyword_values(F_FIXED_EERAM, 'tINT16', 'tUINT16', 'tGDT_TYPE'):
print outword
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.