如何在python中使用正则表达式组仅获得一行的一部分？

Question

How to get only a part of a line using regex group in python ? 如何在python中使用正则表达式组仅获取行的一部分？ I have a database of one entry per line and I want to split it into files according to month and day data at the beginning of a line but I want only to output a line without first 21 characters. 我有一个每行一个条目的数据库，我想根据行首的月和日数据将其拆分为文件，但是我只想输出不包含前21个字符的行。 Here is a quick sample of the database: 这是数据库的快速示例：

01-01-1989-06:30:00| Stefan Reinartz; 1.1.1989; 06:30; +01; Engelskirchen,Germany; 50n59; 7e24; M;
01-01-1996-08:40:00| Dawid Kwiatkowski; 1.1.1996; 08:40; +01; Gorzów Wielkopolski,Poland; 52n44; 15e15; M;
01-01-2001-01:30:00| Liam Flockhart; 1.1.2001; 01:30; -08; San Diego,California; 32n43; 117w09; M;
01-02-1467-00:20:00| King of Poland Sigismund I the Old; 2.1.1467; 00:20; +00:21:33; Kozienice,Poland; 51n35; 21e33; M;
01-02-1746-09:00:00| Duke of Rambouillet Louis Marie; 2.1.1746; 09:00; -00:03:41; Madrid,Spain; 40n24; 3w41; M;
01-02-1784-01:00:00| Duke of Saxe-Coburg and Gotha Ernst I; 2.1.1784; 01:00; +00:10:58; Coburg,Germany; 50n15; 10e58; M;

Desired output File 01-01.zbs: 所需的输出文件01-01.zbs：

Stefan Reinartz; 1.1.1989; 06:30; +01; Engelskirchen,Germany; 50n59; 7e24; M;
Dawid Kwiatkowski; 1.1.1996; 08:40; +01; Gorzów Wielkopolski,Poland; 52n44; 15e15; M;
Liam Flockhart; 1.1.2001; 01:30; -08; San Diego,California; 32n43; 117w09; M;

Output File 01-02.zbs: 输出文件01-02.zbs：

King of Poland Sigismund I the Old; 2.1.1467; 00:20; +00:21:33; Kozienice,Poland; 51n35; 21e33; M;
Duke of Rambouillet Louis Marie; 2.1.1746; 09:00; -00:03:41; Madrid,Spain; 40n24; 3w41; M;
Duke of Saxe-Coburg and Gotha Ernst I; 2.1.1784; 01:00; +00:10:58; Coburg,Germany; 50n15; 10e58; M;

I used the beginning to sort them by each day of the year and to split the file accordingly. 我用开始按一年中的每一天对它们进行排序，并相应地拆分文件。 But I don't want to output the first 21 chars of each line so I am trying to use regex group to do this, like this: 但是我不想输出每行的前21个字符，所以我试图使用regex group来做到这一点，就像这样：

re.search("^[0-9]{2}-[0-9]{2}-[0-9]{4}-[0-9]{2}:[0-9]{2}:[0-9]{2}| (.*)",line[0])
re.search("^.{21}(.*)",line[0])

But, how to use the group (.*) \\1 to only output that part ? 但是，如何使用组（。*）\\ 1仅输出该部分？ Is even regex needed to do this ? 甚至需要使用正则表达式吗？

Here is whole code: I am a very beginner to python so the code is probably quite wrong: 这是完整的代码：我是python的初学者，所以代码可能很错误：

import re
with open("database.txt") as f: 
    pstring='' #previous line string beginning
    astring='' #actual line string beginning
    try:
        out = open(re.search("^[0-9]{2}-[0-9]{2}",line[0]) + ".zbs", "w")
        for line in f:
            astring = re.search("^[0-9]{2}-[0-9]{2}-",line[0])
            if not pstring = astring:
                out.write(line)
                pstring = re.search("^[0-9]{2}-[0-9]{2}-",line[0])
                if out: out.close()
                out = open(re.search("^[0-9]{2}-[0-9]{2}",line[0]) + ".zbs", "w")
            else: 
                pstring = re.search("^[0-9]{2}-[0-9]{2}-",line[0])
                out.write(line)
    finally:
        out.close()

Best regards. 最好的祝福。

Answer 1

Let's consider a single line in your file: 让我们考虑文件中的一行：

line = "01-01-1989-06:30:00| Stefan Reinartz; 1.1.1989; 06:30; +01; Engelskirchen,Germany; 50n59; 7e24; M;"

If you want to get rid of the first 21 characters of a line, then you can simply use what is referred to as slicing as follows: 如果要摆脱一行的前21个字符，则可以简单地使用所谓的切片，如下所示：

>>> print(line[21:])
Stefan Reinartz; 1.1.1989; 06:30; +01; Engelskirchen,Germany; 50n59; 7e24; M;

(Have a look at this site for more details about retrieving substrings via slicing.) （请访问此站点，以获取有关通过切片检索子字符串的更多详细信息。）

Now, if you need to extract parts of such a line, then you can indeed make use of regular expressions. 现在，如果您需要提取这样的行的一部分，那么您确实可以使用正则表达式。 To get the parts of the date, as you mentioned, you can use, eg, a pattern with named groups as follows: 如前所述，要获取日期的各个部分，可以使用具有命名组的模式，如下所示：

import re
p = r"[^\;]+; (?P<day>[0-9]+)\.(?P<month>[0-9]+)\.(?P<year>[0-9]+)"
m = re.match(p, line)

The matched groups may then be accessed like this: 然后可以像这样访问匹配的组：

>>> m.group("day")
'1'
>>> m.group("month")
'1'
>>> m.group("year")
'1989'

(You can, of course, get the date more easily by extracting it right from the beginning of a line, but this is just an example that demonstrates the use of named group.) （当然，您可以通过从行首开始提取日期来更轻松地获取日期，但这只是一个示例，它演示了命名组的用法。）

如何在python中使用正则表达式组仅获得一行的一部分？

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-12-03 17:08:21

如何在python中使用正则表达式组仅获得一行的一部分？

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-12-03 17:08:21

解决方案1
1 已采纳 2017-12-03 17:08:21