Python如何将特定字符串提取到多个变量中

Question

我试图在文件中提取一个特定的行作为变量。

这是我的test.txt的内容

#first set
Task Identification Number: 210CT1
Task title: Assignment 1
Weight: 25
fullMark: 100
Description: Program and design and complexity running time.

#second set
Task Identification Number: 210CT2
Task title: Assignment 2
Weight: 25
fullMark: 100
Description: Shortest Path Algorithm

#third set
Task Identification Number: 210CT3
Task title: Final Examination
Weight: 50
fullMark: 100
Description: Close Book Examination

这是我的代码

with open(home + '\\Desktop\\PADS Assignment\\test.txt', 'r') as mod:
    for line in mod:
        taskNumber , taskTile , weight, fullMark , desc = line.strip(' ').split(": ") 
        print(taskNumber)
        print(taskTile)
        print(weight)
        print(fullMark)
        print(description)

这是我正在尝试做的事情：

taskNumber is 210CT1 
taskTitle is Assignment 1
weight is 25
fullMark is 100
desc is Program and design and complexity running time

and loop until the third set

但输出中出现错误

ValueError: not enough values to unpack (expected 5, got 2)

对SwiftsNamesake的回应

我试过你的代码。 我仍然收到错误。

ValueError: too many values to unpack (expected 5)

这是我尝试使用您的代码

 from itertools import zip_longest

 def chunks(iterable, n, fillvalue=None):
     args = [iter(iterable)] * n
     return zip_longest(*args, fillvalue=fillvalue)


with open(home + '\\Desktop\\PADS Assignment\\210CT.txt', 'r') as mod:
    for group in chunks(mod.readlines(), 5+2, fillvalue=''):
    # Choose the item after the colon, excluding the extraneous rows
    # that don't have one.
    # You could probably find a more elegant way of achieving the same thing
        l = [item.split(': ')[1].strip() for item in group if ':' in item]
    taskNumber , taskTile , weight, fullMark , desc = l
        print(taskNumber , taskTile , weight, fullMark , desc, sep='|')

Answer 1

如前所述，您需要某种分块。 为了有用地分块，我们还需要忽略文件的不相关行。 我在下面用一些不错的Python巫术实现了这样的功能。

它也可能适合您使用namedtuple来存储值。 namedtuple是一种非常简单的对象类型，它只存储许多不同的值 - 例如，2D空间中的点可能是带有x和ay字段的namedtuple。 这是Python文档中给出的示例。 如果您愿意，您应该参考该链接以获取有关namedtuples及其用途的更多信息。 我冒昧地创建了一个带有字段["number", "title", "weight", "fullMark", "desc"]的Task类。

由于您的变量是任务的所有属性，因此为了简洁和清晰起见，使用命名元组可能有意义。

除此之外，我一直试图坚持你的方法，通过冒号分裂。 我的代码生成输出

================================================================================
number is 210CT1
title is Assignment 1
weight is 25
fullMark is 100
desc is Program and design and complexity running time.
================================================================================
number is 210CT2
title is Assignment 2
weight is 25
fullMark is 100
desc is Shortest Path Algorithm
================================================================================
number is 210CT3
title is Final Examination
weight is 50
fullMark is 100
desc is Close Book Examination

这看起来大概就是你所追求的 - 我不确定你的输出要求有多严格。 但是，为此目的进行修改应该相对容易。

这是我的代码，带有一些解释性注释：

from collections import namedtuple

#defines a simple class 'Task' which stores the given properties of a task
Task = namedtuple("Task", ["number", "title", "weight", "fullMark", "desc"])

#chunk a file (or any iterable) into groups of n (as an iterable of n-tuples)
def n_lines(n, read_file):
    return zip(*[iter(read_file)] * n)

#used to strip out empty lines and lines beginning with #, as those don't appear to contain any information
def line_is_relevant(line):
    return line.strip() and line[0] != '#'

with open("input.txt") as in_file:
    #filters the file for relevant lines, and then chunks into 5 lines
    for task_lines in n_lines(5, filter(line_is_relevant, in_file)):
        #for each line of the task, strip it, split it by the colon and take the second element
        #(ie the remainder of the string after the colon), and build a Task from this
        task = Task(*(line.strip().split(": ")[1] for line in task_lines))
        #just to separate each parsed task
        print("=" * 80)
        #iterate over the field names and values in the task, and print them
        for name, value in task._asdict().items():
            print("{} is {}".format(name, value))

您还可以引用任务的每个字段，如下所示：

            print("The number is {}".format(task.number))

如果不需要namedtuple方法，请随意替换main for循环的内容

        taskNumber, taskTitle, weight, fullMark, desc = (line.strip().split(": ")[1] for line in task_lines)

然后你的代码将恢复正常。

关于我做出的其他改变的一些注释：

filter执行它在锡上所说的内容，只迭代符合谓词的行（ line_is_relevant(line)为True ）。

Task实例化中的*解包迭代器，因此每个解析的行都是Task构造函数的参数。

表达式(line.strip().split(": ")[1] for line in task_lines)是一个生成器。 这是必要的，因为我们使用task_lines执行多行，因此对于我们的'chunk'中的每一行，我们将其剥离，用冒号拆分并取第二个元素，即值。

n_lines函数的工作原理是将对同一迭代器的n个引用列表传递给zip函数（文档）。 zip然后尝试从该列表的每个元素产生下一个元素，但由于n个元素中的每一个都是文件上的迭代器，因此zip产生n行文件。 这一直持续到迭代器耗尽为止。

line_is_relevant函数使用“真实性”的概念。 实现它的更详细的方法可能是

def line_is_relevant(line):
    return len(line.strip()) > 0 and line[0] != '#'

但是，在Python中，每个对象都可以隐式地用在布尔逻辑表达式中。 这样的表达式中的空字符串（ "" ）充当False ，非空字符串充当True ，所以很方便，如果line.strip()为空，则它将作为False ，因此line_is_relevant将为False 。 的and运营商也将短路如果第一操作数是falsy，这意味着第二个操作数将不进行评价，因此，方便地，所述参考line[0]将不会导致IndexError 。

好的，这是我尝试更广泛地解释n_lines function ：

首先， zip函数允许您一次迭代多个' iterable '。 一个iterable就像一个列表或文件，你可以在for循环中查看，所以zip函数可以让你做这样的事情：

>>> for i in zip(["foo", "bar", "baz"], [1, 4, 9]):
...     print(i)
... 
('foo', 1)
('bar', 4)
('baz', 9)

zip函数一次返回每个列表中一个元素的“ tuple ”。 元组基本上是一个列表，除了它是不可变的，所以你不能改变它，因为zip不希望你改变它给你的任何值，而是用它们做一些事情。 除了那个之外，元组几乎可以像普通列表一样使用。 现在一个有用的技巧是使用'解包'来分隔元组的每个位，如下所示：

>>> for a, b in zip(["foo", "bar", "baz"], [1, 4, 9]):
...     print("a is {} and b is {}".format(a, b))  
... 
a is foo and b is 1
a is bar and b is 4
a is baz and b is 9

一个更简单的解包示例，您可能已经看过（Python也允许您省略括号（））：

>>> a, b = (1, 2)
>>> a
1
>>> b
2

虽然n-lines function不使用此n-lines function 。 现在zip也可以使用多个参数 - 您可以根据需要压缩三个，四个或多个列表（非常多）。

>>> for i in zip([1, 2, 3], [0.5, -2, 9], ["cat", "dog", "apple"], "ABC"):
...     print(i)
... 
(1, 0.5, 'cat', 'A')
(2, -2, 'dog', 'B')
(3, 9, 'apple', 'C')

现在n_lines函数将*[iter(read_file)] * n传递给zip 。 这里有几件事要介绍 - 我将从第二部分开始。 请注意，第一个*优先级低于其后的所有内容，因此它等同于*([iter(read_file)] * n) 。 现在， iter(read_file)所做的是通过调用iter来从read_file构造一个迭代器对象。 迭代器有点像列表，除了你不能索引它，就像it[0] 。 所有你能做的就是'迭代它'，就像在for循环中重复它一样。 然后它使用此迭代器作为唯一元素构建长度为1的列表。 然后它将此列表“乘以” n 。

在Python中，使用带有列表的*运算符将其连接到自身n次。 如果你考虑一下，这种情况有意义，因为+是连接运算符。 所以，例如，

>>> [1, 2, 3] * 3 == [1, 2, 3] + [1, 2, 3] + [1, 2, 3] == [1, 2, 3, 1, 2, 3, 1, 2, 3]
True

顺便说一句，这使用Python的链式比较运算符 - a == b == c等价于a == b and b == c ，除了b只需要评估一次，这在99％的时间内都不重要。

无论如何，我们现在知道*运算符复制列表n次。 它还有一个属性 - 它不构建任何新对象。 这可能有点儿了 -

>>> l = [object()] * 3
>>> id(l[0])
139954667810976
>>> id(l[1])
139954667810976
>>> id(l[2])
139954667810976

这里有三个object - 但它们实际上都是同一个对象（你可能会认为这是同一个对象的三个'指针'）。 如果要构建更复杂对象（如列表）的列表，并执行就地排序等操作，则会影响列表的所有元素。

>>> l = [ [3, 2, 1] ] * 4
>>> l
[[3, 2, 1], [3, 2, 1], [3, 2, 1], [3, 2, 1]]
>>> l[0].sort()
>>> l
[[1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3]]

所以[iter(read_file)] * n相当于

it = iter(read_file)
l = [it, it, it, it... n times]

现在第一个* ，优先级低的那个，再次“解包”这个，但这次没有将它分配给变量，而是分配给zip的参数。 这意味着zip接收列表的每个元素作为单独的参数，而不是仅列出一个参数。 以下是解压缩如何在更简单的情况下工作的示例：

>>> def f(a, b):
...     print(a + b)
... 
>>> f([1, 2]) #doesn't work
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: f() missing 1 required positional argument: 'b'
>>> f(*[1, 2]) #works just like f(1, 2)
3

所以实际上，现在我们有类似的东西

it = iter(read_file)
return zip(it, it, it... n times)

请记住，当您在for循环中对文件对象进行“迭代”时，会迭代文件的每一行，因此当zip尝试同时“遍历”n个对象中的每一个时，它会从每个对象中绘制一条线 - 但由于每个对象都是相同的迭代器，因此该行被“消耗”，它绘制的下一行是文件的下一行。 从它的每个n个参数中进行一次“循环”迭代产生n行，这就是我们想要的。

Answer 2

您的line变量仅获取Task Identification Number: 210CT1作为其第一个输入。 你试图通过以下方式从中提取5个值:但是那里只有2个值。

你想要的是将你的for循环划分为5，将每组读取为5行，并将每行分成: 。

Answer 3

你试图获得比一条线上更多的数据; 五条数据分开排列。

正如SwiftsNamesake建议的那样，您可以使用itertools对行进行分组：

import itertools

def keyfunc(line):
    # Ignores comments in the data file.
    if len(line) > 0 and line[0] == "#":
        return True

    # The separator is an empty line between the data sets, so it returns
    # true when it finds this line.
    return line == "\n"

with open(home + '\\Desktop\\PADS Assignment\\test.txt', 'r') as mod:
    for k, g in itertools.groupby(mod, keyfunc):
        if not k: # Does not process lines that are separators.
            for line in g:
                data = line.strip().partition(": ")
                print(f"{data[0] is {data[2]}")
                # print(data[0] + " is " + data[2]) # If python < 3.6

            print("") # Prints a newline to separate groups at the end of each group.

如果要在其他函数中使用数据，请将其作为字典从生成器输出：

from collections import OrderedDict
import itertools

def isSeparator(line):
    # Ignores comments in the data file.
    if len(line) > 0 and line[0] == "#":
        return True

    # The separator is an empty line between the data sets, so it returns
    # true when it finds this line.
    return line == "\n"

def parseData(data):
    for line in data:
        k, s, v = line.strip().partition(": ")
        yield k, v

def readData(filePath):
    with open(filePath, "r") as mod:
        for key, g in itertools.groupby(mod, isSeparator):
            if not key: # Does not process lines that are separators.
                yield OrderedDict((k, v) for k, v in parseData(g))

def printData(data):
    for d in data:
        for k, v in d.items():
          print(f"{k} is {v}")
          # print(k + " is " + v) # If python < 3.6

        print("") # Prints a newline to separate groups at the end of each group.

data = readData(home + '\\Desktop\\PADS Assignment\\test.txt')
printData(data)

Answer 4

正如另一张海报（@Cuber）已经说过的那样，你将逐行循环，而数据集分为五行。 错误消息基本上是说当你拥有的是两个时，你正在尝试解压缩五个值。 此外，看起来你只对结肠右侧的值感兴趣，所以你真的只有一个值。

有多种方法可以解决这个问题，但最简单的方法是将数据分组为五个（加上填充，使其成为七个）并一次处理。

首先，我们将定义chunks ，我们将把它变成一个优雅的循环（来自itertools文档）。

from itertools import zip_longest

def chunks(iterable, n, fillvalue=None):
  args = [iter(iterable)] * n
  return zip_longest(*args, fillvalue=fillvalue)

现在，我们将它与您的数据一起使用。 我省略了文件样板文件。

for group in chunks(mod.readlines(), 5+2, fillvalue=''):
  # Choose the item after the colon, excluding the extraneous rows
  # that don't have one.
  # You could probably find a more elegant way of achieving the same thing
  l = [item.split(': ')[1].strip() for item in group if ':' in item]
  taskNumber , taskTile , weight, fullMark , desc = l
  print(taskNumber , taskTile , weight, fullMark , desc, sep='|')

2 in 5+2用于填充（上面的注释和下面的空行）。

目前对chunks的实现可能没有意义。 如果是这样，我建议查看Python生成器（特别是itertools文档，这是一个了不起的资源）。 在Python REPL中使用片段来弄脏和修补也是一个好主意。

Answer 5

您仍然可以逐行阅读，但您必须帮助代码了解它的解析内容。 我们可以使用OrderedDict来查找适当的变量名。

import os
import collections as ct


def printer(dict_, lookup):
    for k, v in lookup.items():
        print("{} is {}".format(v, dict_[k]))
    print()


names = ct.OrderedDict([
    ("Task Identification Number", "taskNumber"),
    ("Task title", "taskTitle"),
    ("Weight", "weight"),
    ("fullMark","fullMark"),
    ("Description", "desc"),
])

filepath = home + '\\Desktop\\PADS Assignment\\test.txt'
with open(filepath, "r") as f:
    for line in f.readlines():
        line = line.strip()
        if line.startswith("#"):
            header = line
            d = {}
            continue
        elif line:
            k, v = line.split(":")
            d[k] = v.strip(" ")
        else:
            printer(d, names)
    printer(d, names)

产量

taskNumber is 210CT3
taskTitle is Final Examination
weight is 50
fullMark is 100
desc is Close Book Examination

taskNumber is 210CT1
taskTitle is Assignment 1
weight is 25
fullMark is 100
desc is Program and design and complexity running time.

taskNumber is 210CT2
taskTitle is Assignment 2
weight is 25
fullMark is 100
desc is Shortest Path Algorithm

Answer 6

这里的问题是你要按行分割：对于每一行只有1：所以有2个值。 在这一行：

taskNumber , taskTile , weight, fullMark , desc = line.strip(' ').split(": ")

你告诉它有5个值，但它只找到2，所以它给你一个错误。

解决此问题的一种方法是为每个值运行多个for循环，因为不允许更改文件的格式。 我会使用第一个单词并将数据分类为不同的

import re
Identification=[]
title=[]
weight=[]
fullmark=[]
Description=[]
with open(home + '\\Desktop\\PADS Assignment\\test.txt', 'r') as mod::
    for line in mod:
        list_of_line=re.findall(r'\w+', line)
        if len(list_of_line)==0:
            pass
        else:
            if list_of_line[0]=='Task':
                if list_of_line[1]=='Identification':
                    Identification.append(line[28:-1])
                if list_of_line[1]=='title':
                    title.append(line[12:-1])
            if list_of_line[0]=='Weight':
                weight.append(line[8:-1])
            if list_of_line[0]=='fullMark':
                fullmark.append(line[10:-1])
            if list_of_line[0]=='Description':
                Description.append(line[13:-1])


print('taskNumber is %s' % Identification[0])
print('taskTitle is %s' % title[0])
print('Weight is %s' % weight[0])
print('fullMark is %s' %fullmark[0])
print('desc is %s' %Description[0])
print('\n')
print('taskNumber is %s' % Identification[1])
print('taskTitle is %s' % title[1])
print('Weight is %s' % weight[1])
print('fullMark is %s' %fullmark[1])
print('desc is %s' %Description[1])
print('\n')
print('taskNumber is %s' % Identification[2])
print('taskTitle is %s' % title[2])
print('Weight is %s' % weight[2])
print('fullMark is %s' %fullmark[2])
print('desc is %s' %Description[2])
print('\n')

当然你可以使用循环打印但我太懒了所以我复制并粘贴:)。 如果您需要任何帮助或有任何疑问请请！ 这个代码假定你在编码方面没有先进的好运！

Answer 7

受itertools相关解决方案的启发，另一个是使用more-itertools库中的more_itertools.grouper工具。 它的行为类似于@ SwiftsNamesake的chunks功能。

import collections as ct

import more_itertools as mit


names = dict([
    ("Task Identification Number", "taskNumber"),
    ("Task title", "taskTitle"),
    ("Weight", "weight"),
    ("fullMark","fullMark"),
    ("Description", "desc"),
])


filepath = home + '\\Desktop\\PADS Assignment\\test.txt'
with open(filepath, "r") as f:
    lines = (line.strip() for line in f.readlines())
    for group in mit.grouper(7, lines):
        for line in group[1:]:
            if not line: continue
            k, v = line.split(":")
            print("{} is {}".format(names[k], v.strip()))
        print()

产量

taskNumber is 210CT1
taskTitle is Assignment 1
weight is 25
fullMark is 100
desc is Program and design and complexity running time.

taskNumber is 210CT2
taskTitle is Assignment 2
weight is 25
fullMark is 100
desc is Shortest Path Algorithm

taskNumber is 210CT3
taskTitle is Final Examination
weight is 50
fullMark is 100
desc is Close Book Examination

注意使用相应的值打印变量名称 。

Python如何将特定字符串提取到多个变量中

问题描述

7 个解决方案

解决方案1
2 已采纳 2017-08-19 18:49:08

解决方案2
1 2017-08-19 17:21:29

解决方案3
0 2017-08-19 17:52:57

解决方案4
0 2017-08-19 18:10:54

解决方案5
0 2017-08-19 18:17:10

解决方案6
0 2017-08-19 18:28:35

解决方案7
0 2017-08-19 19:23:41

Python如何将特定字符串提取到多个变量中

问题描述

7 个解决方案

解决方案1 2 已采纳 2017-08-19 18:49:08

解决方案2 1 2017-08-19 17:21:29

解决方案3 0 2017-08-19 17:52:57

解决方案4 0 2017-08-19 18:10:54

解决方案5 0 2017-08-19 18:17:10

解决方案6 0 2017-08-19 18:28:35

解决方案7 0 2017-08-19 19:23:41

解决方案1
2 已采纳 2017-08-19 18:49:08

解决方案2
1 2017-08-19 17:21:29

解决方案3
0 2017-08-19 17:52:57

解决方案4
0 2017-08-19 18:10:54

解决方案5
0 2017-08-19 18:17:10

解决方案6
0 2017-08-19 18:28:35

解决方案7
0 2017-08-19 19:23:41