格式化原始字符串Python

Question

我在Python中有一个通过imap库检索的原始字符串。

看起来像这样：

Season: Winter 2017-18
Activity: Basketball - Boys JV
*DATE: 02/13/2018 * - ( previously 02/06/2018 )
Event type: Game
Home/Host: Clear Lake
Opponent: Webster City
*START TIME: 6:15PM CST* - ( previously 4:30PM CST )
Location: Clear Lake High School, 125 N. 20th Street, Clear Lake, IA

刮取每个标签（标签为DATE: DATE: 02/13/2018 * - ( previously 02/06/2018 )之后的数据的最佳方法是什么，例如DATE: 02/13/2018 * - ( previously 02/06/2018 )将被设置为等于变量，如Date ，因此，当print(date) ，将输出02/13/2018 * - ( previously 02/06/2018 ) 。

我尝试了以下代码，但每行只打印一个字符。 谢谢！

for line in message:
     if "DATE:" in line:
          print line

Answer 1

您可以使用正则表达式和字典：

import re
s = """
Season: Winter 2017-18
Activity: Basketball - Boys JV
*DATE: 02/13/2018 * - ( previously 02/06/2018 )
Event type: Game
Home/Host: Clear Lake
Opponent: Webster City
*START TIME: 6:15PM CST* - ( previously 4:30PM CST )
Location: Clear Lake High School, 125 N. 20th Street, Clear Lake, IA
"""
final_dict = {(a[1:] if a.startswith('*') else a).strip('\r'):b.strip('\r') for a, b in filter(lambda x:len(x)> 1, [re.split('\:\s', i) for i in filter(None, s.split('\n'))])}

输出：

{'Home/Host': 'Clear Lake', 'Season': 'Winter 2017-18', 'START TIME': '6:15PM CST* - ( previously 4:30PM CST )', 'Location': 'Clear Lake High School, 125 N. 20th Street, Clear Lake, IA', 'Activity': 'Basketball - Boys JV', 'DATE': '02/13/2018 * - ( previously 02/06/2018 )', 'Event type': 'Game', 'Opponent': 'Webster City'}

Answer 2

您可以使用str.splitlines()将字符串分割成str.splitlines()行。 然后遍历各行并使用正则表达式提取数据，例如：

import re

for line in message.splitlines():
    match = re.match(r'\*DATE: (.*)', line)
    if match:
        date = match.group(1)
        print date

Answer 3

For line in message迭代For line in message每个项目：简单来说，消息是一个字符串，其项目是字符（因此，它迭代每个字符）。

拆分是解决问题的一种简单/幼稚的方法，但是只要您的数据不会变得更加复杂，它就可能会起作用：

使用message.split("\\n")在换行符上分割字符串并对其进行迭代。 然后，您可以使用line.strip().strip("*").split(":", maxsplit=1)将键与值分开。 第一个strip()删除可能剩余的多余空格（例如潜在的“ \\ r”），第二个strip()删除多余的星号。 maxsplit=1在第一个冒号处停止（如果您的数据将冒号作为标签的一部分，则可能会出现问题）。

我之所以说键/值，是因为您实际上并不需要（或想要）将这些对动态分配给实际变量，并且可以将其存储为字典并根据需要对其进行查询。

output = dict()
for line in message.split("\n"): ## Split Lines
    key,value = line.strip().split(":",maxsplit=1) ## Remove extra whitespace/* and split at the first colon
    output[key] = value

编辑：我的印象是“日期”仅是您的示例，但是如果这就是您要查找的所有内容，那么显然if key == "DATE"并添加/返回/打印/等值就可以添加该行。

Answer 4

如果您的数据位于名为datafile.txt的文件中，则可以尝试以下操作：

with open('datafile.txt', 'r') as f:
    for line in f:
         if line.startswith("*DATE:"):
            print(line)

Answer 5

此解决方案有效（并且我相信是相当“ Pythonic”的）：

lines = message.split("\n") # Split your message into "lines"
sections = [line.split(": ") for line in lines] # Split lines by the "colon space"
message_dict = {section[0].lstrip(' '): section[1] for section in sections} # Dictionary comprehension to put your keys and values into a dict struct. Also removes leading whitespace from your keys.

格式化原始字符串Python

问题描述

5 个解决方案

解决方案1
5 已采纳 2017-12-29 14:56:20

解决方案2
3 2017-12-29 14:58:54

解决方案3
2 2017-12-29 15:03:14

解决方案4
0 2017-12-29 15:04:44

解决方案5
0 2017-12-29 15:07:22

格式化原始字符串Python

问题描述

5 个解决方案

解决方案1 5 已采纳 2017-12-29 14:56:20

解决方案2 3 2017-12-29 14:58:54

解决方案3 2 2017-12-29 15:03:14

解决方案4 0 2017-12-29 15:04:44

解决方案5 0 2017-12-29 15:07:22

解决方案1
5 已采纳 2017-12-29 14:56:20

解决方案2
3 2017-12-29 14:58:54

解决方案3
2 2017-12-29 15:03:14

解决方案4
0 2017-12-29 15:04:44

解决方案5
0 2017-12-29 15:07:22