Python使用键从文本中提取值

Question

I have a text file in the following format of Key Value 我有一个以下Key Value格式的文本文件

--START--
FirstName Kitty
LastName McCat
Color Red
random_data
Meow Meow
--END--

I'm wanting to extract specific values from the text into a variable or a dict. 我想将文本中的特定值提取到变量或字典中。 For example if I want to extract the values of LastName and Color what would be the best way to do this? 例如，如果我要提取LastName和Color的值，什么是最好的方法？

The random_data may be anywhere in the file and span multiple lines. random_data可以在文件中的任何位置，并且跨多行。

I've considered using regex but am concerned with performance and readability as in the real code I have many different keys to extract. 我已经考虑过使用正则表达式，但是我担心性能和可读性，因为在真实代码中，我需要提取许多不同的键。

I could also loop over each line and check for each key but it's quite messy when having 10+ keys. 我还可以遍历每一行并检查每个键，但是当有10个以上的键时，这很混乱。 For example: 例如：

if line.startswith("LastName"):
    #split line at space and handle
if line.startswith("Color"):
    #split line at space and handle

Hoping for something a little cleaner 希望有一些清洁的东西

Answer 1

tokens = ['LastName', 'Color']  
dictResult = {} 
with open(fileName,'r') as fileHandle: 
   for line in fileHandle:
      lineParts = line.split(" ")
      if len(lineParts) == 2 and lineParts[0] in tokens:
           dictResult[lineParts[0]] = lineParts[1]

Answer 2

Assuming your file is in something called sampletxt.txt, this would work. 假设您的文件位于一个名为sampletxt.txt的文件中，那么它将起作用。 It creates a dictionary mapping from key -> list of values. 它从键->值列表创建字典映射。

import re  
with open('sampletxt.txt', 'r') as f:
    txt = f.read()
keys = ['FirstName', 'LastName', 'Color']
d = {}
for key in keys:
    d[key] = re.findall(key+r'\s(.*)\s*\n*', txt)

Answer 3

This version allows you to optionally specify the tokens 此版本允许您选择指定令牌

import re

s = """--START--
FirstName Kitty
LastName McCat
Color Red
random_data
Meow Meow
--END--"""

tokens = ["LastName", "Color"]
if len(tokens) == 0:
    print(re.findall("({0}) ({0})".format("\w+"), s))
else:
    print( list((t, re.findall("{} (\w+)".format(t), s)[0]) for t in tokens))

Output 产量

[('LastName', 'McCat'), ('Color', 'Red')]

Answer 4

Building off the other answers, this function would use regular expressions to take any text key and return the value if found: 在建立其他答案的基础上，此函数将使用正则表达式获取任何文本键并返回找到的值：

import re
file_name = 'test.txt'

def get_text_value(text_key, file_name):
    match_str = text_key + "\s(\w+)\n"

    with open(file_name, "r") as f:
        text_to_check = f.readlines()

    text_value = None
    for line in text_to_check:

        matched = re.match(match_str, line)
        if matched:
            text_value = matched.group(1)

    return text_value

if __name__ == "__main__":

    first_key = "FirstName"
    first_value = get_text_value(first_key, file_name)
    print('Check for first key "{}" and value "{}"'.format(first_key,
                                                           first_value))

    second_key = "Color"
    second_value = get_text_value(second_key, file_name)
    print('Check for first key "{}" and value "{}"'.format(second_key,
                                                           second_value))

Python使用键从文本中提取值

问题描述

4 个解决方案

解决方案1
1 已采纳 2016-01-27 23:19:28

解决方案2
0 2016-01-27 23:20:49

解决方案3
0 2016-01-27 23:25:43

Output 产量

解决方案4
0 2016-01-27 23:29:15

Python使用键从文本中提取值

问题描述

4 个解决方案

解决方案1 1 已采纳 2016-01-27 23:19:28

解决方案2 0 2016-01-27 23:20:49

解决方案3 0 2016-01-27 23:25:43

Output 产量

解决方案4 0 2016-01-27 23:29:15

解决方案1
1 已采纳 2016-01-27 23:19:28

解决方案2
0 2016-01-27 23:20:49

解决方案3
0 2016-01-27 23:25:43

解决方案4
0 2016-01-27 23:29:15