简体   繁体   English

如何在与关键字列表中的某项匹配的字符串之后提取字符串中的单词

[英]How to extract a word in a string following one that matches something in a key word list

I am a newcomer to Python. 我是Python的新手。 I can split a line of a file up into words, but haven't found out how to get at the word which follows a match to a set of key words. 我可以将文件的一行分割成多个单词,但是还没有找到如何获得与一组关键字匹配的单词的方法。

    fread = open (F_FIXED_EERAM, 'r')
    KEYWORDS = ['tINT16', 'tUINT16', 'tGDT_TYPE']
    for line in fread.readlines():
        words = line.split()
        for word in words:
            if word in KEYWORDS:
    #       I want to append the word after the keyword to a new string in another file
    #       How do I get at that word?
    ...

Just set a boolean to store the next word if a keyword was found: 如果找到关键字,只需设置一个布尔值来存储下一个单词:

KEYWORDS = ['tINT16', 'tUINT16', 'tGDT_TYPE']
result = []

with open (F_FIXED_EERAM, 'r') as fread:
    for line in fread:
        store_next = False
        words = line.split()
        for word in words:
            if store_next:
                result.append(word)
                store_next = False
            elif word in KEYWORDS:
                store_next = True

result is now a list of all words that where preceded by one of the KEYWORDS . result现在是所有单词的列表,这些单词前面有一个KEYWORDS

I made the assumption if the last word of the previous line is a keyword, the first word on the next line doesn't have to be stored. 我假设上一行的最后一个单词是关键字,而下一行的第一个单词则不必存储。 If you do want this behaviour move store_next = False outside the (outer) for loop. 如果您确实希望这种行为,请将store_next = False移到for循环的(外部)外部。


Or you could use a regular expression : 或者您可以使用regular expression

import re

KEYWORDS = ['tINT16', 'tUINT16', 'tGDT_TYPE']

regex = '(?:{}) +(\\w+)'.format('|'.join(map(re.escape, KEYWORDS)))

with open ('in.txt', 'r') as file_:
    print(re.findall(regex, file_.read()))

This might look like magic, but this is the actual regular expression used: 这看起来像魔术,但这是实际使用的正则表达式:

(?:tINT16|tUINT16|tGDT_TYPE) +(\w+)

Which translates to: match one of the keywords followed by one or more spaces followed by a word. 转换为:匹配关键字之一,后跟一个或多个空格,后跟一个单词。 ?: at the beginning tells Python not to store that group. ?:开头告诉Python不要存储该组。 \\w is equivalent to [a-zA-Z0-9_] (depending on LOCALE and UNICODE flags). \\w等效于[a-zA-Z0-9_] (取决于LOCALE和UNICODE标志)。

You can either use enumerate(words) giving you the following 您可以使用enumerate(words)为您提供以下内容

for i, word in enumerate(words):
  if word in KEYWORDS:
    if(i+1<len(words)):
      str.append(word[i+1])

Or you can use the re library http://docs.python.org/library/re.html . 或者,您可以使用rehttp://docs.python.org/library/re.html Here you can specify a regular expression an easily parse out specific values straight into an array 在这里,您可以指定正则表达式,轻松地将特定值解析为数组

Maybe the following code is what you want. 也许下面的代码是您想要的。 Please notice that if the keyword appears at the end of line, you need to add some special processing. 请注意,如果关键字出现在行尾,则需要添加一些特殊处理。

newstring = ''
fread = open (F_FIXED_EERAM, 'r')
KEYWORDS = ['tINT16', 'tUINT16', 'tGDT_TYPE']
for line in fread.readlines():
    words = line.split()
    for i in range(0,len(words)-1):
        if words[i] in KEYWORDS:
            newstring += words[i+1]

The easiest way to do this is to keep track of the word you saw the last time through the loop. 最简单的方法是跟踪您上次在循环中看到的单词。 If this word is one of your keywords, then the current word is the word following it. 如果该单词是您的关键字之一,那么当前单词就是其后的单词。 It is natural to write this as a generator. 将其写为生成器是很自然的。 It is also convenient to write a generator that returns the individual words (tokens) from a file. 写一个生成器从文件中返回单个单词(标记)也很方便。

def tokens_from(filename):
    with open(filename) as f:
        for line in f:
            for token in line.split():
                yield token

def keyword_values(filename, *keywords):
    keywords = set(keywords)
    previous = None
    for token in tokens_from(filename):
        if previous in keywords:
            yield token
        previous = token

Now you can get the words into a list: 现在,您可以将单词放入列表中:

result = list(keyword_values(F_FIXED_EERAM, 'tINT16', 'tUINT16', 'tGDT_TYPE'))

Or you can build up a string: 或者您可以建立一个字符串:

result = " ".join(keyword_values(F_FIXED_EERAM, 'tINT16', 'tUINT16', 'tGDT_TYPE'))

Or you can iterate over them and write them to a file: 或者,您可以遍历它们并将它们写入文件:

with open("outfile.txt", "w") as outfile:
   for outword in keyword_values(F_FIXED_EERAM, 'tINT16', 'tUINT16', 'tGDT_TYPE'):
       print outword

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如果匹配两个单词中的一个,如何删除字符串中的第一个单词 - How to remove first word in string if it matches one of two words 如何在python中拆分字符串并只提取一个单词? - How to split string and extract only one word in python? 如何提取字符串的一部分,从一个单词到另一个单词? - How to extract part of a string, from one word to the other? 如果一个单词匹配,如何返回True? - How to return True if even one word matches? 如何使用正向隐式断言从单词“ named”后面的字符串中提取子字符串 - How to use positive lookbehind assertions to extract substring from string following the word “named” 如何将列表列表中与第一个列表中的单词匹配的单词替换为与第二个列表中的第一个单词具有相同位置的单词? - How can I replace a word in my list of lists that matches a word from 1st list into a word that has the same position as the 1st one in the 2nd list? 如何在xpath中使用Match函数提取英语单词 - How to use matches function in xpath to extract an English word 从字符串中提取一个词 - Extract a word from string 当单词在单词列表中时打印一些东西 - Print something when a word is in a word list 从 Python 中的单词列表中提取给定单词之前的一个单词的正则表达式 - Regular expression to extract one word before a given word from a word list in Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM