简体   繁体   English

如何在python中使用正则表达式组仅获得一行的一部分?

[英]How to get only a part of a line using regex group in python?

How to get only a part of a line using regex group in python ? 如何在python中使用正则表达式组仅获取行的一部分? I have a database of one entry per line and I want to split it into files according to month and day data at the beginning of a line but I want only to output a line without first 21 characters. 我有一个每行一个条目的数据库,我想根据行首的月和日数据将其拆分为文件,但是我只想输出不包含前21个字符的行。 Here is a quick sample of the database: 这是数据库的快速示例:

01-01-1989-06:30:00| Stefan Reinartz; 1.1.1989; 06:30; +01; Engelskirchen,Germany; 50n59; 7e24; M;
01-01-1996-08:40:00| Dawid Kwiatkowski; 1.1.1996; 08:40; +01; Gorzów Wielkopolski,Poland; 52n44; 15e15; M;
01-01-2001-01:30:00| Liam Flockhart; 1.1.2001; 01:30; -08; San Diego,California; 32n43; 117w09; M;
01-02-1467-00:20:00| King of Poland Sigismund I the Old; 2.1.1467; 00:20; +00:21:33; Kozienice,Poland; 51n35; 21e33; M;
01-02-1746-09:00:00| Duke of Rambouillet Louis Marie; 2.1.1746; 09:00; -00:03:41; Madrid,Spain; 40n24; 3w41; M;
01-02-1784-01:00:00| Duke of Saxe-Coburg and Gotha Ernst I; 2.1.1784; 01:00; +00:10:58; Coburg,Germany; 50n15; 10e58; M;

Desired output File 01-01.zbs: 所需的输出文件01-01.zbs:

Stefan Reinartz; 1.1.1989; 06:30; +01; Engelskirchen,Germany; 50n59; 7e24; M;
Dawid Kwiatkowski; 1.1.1996; 08:40; +01; Gorzów Wielkopolski,Poland; 52n44; 15e15; M;
Liam Flockhart; 1.1.2001; 01:30; -08; San Diego,California; 32n43; 117w09; M;

Output File 01-02.zbs: 输出文件01-02.zbs:

King of Poland Sigismund I the Old; 2.1.1467; 00:20; +00:21:33; Kozienice,Poland; 51n35; 21e33; M;
Duke of Rambouillet Louis Marie; 2.1.1746; 09:00; -00:03:41; Madrid,Spain; 40n24; 3w41; M;
Duke of Saxe-Coburg and Gotha Ernst I; 2.1.1784; 01:00; +00:10:58; Coburg,Germany; 50n15; 10e58; M;

I used the beginning to sort them by each day of the year and to split the file accordingly. 我用开始按一年中的每一天对它们进行排序,并相应地拆分文件。 But I don't want to output the first 21 chars of each line so I am trying to use regex group to do this, like this: 但是我不想输出每行的前21个字符,所以我试图使用regex group来做到这一点,就像这样:

re.search("^[0-9]{2}-[0-9]{2}-[0-9]{4}-[0-9]{2}:[0-9]{2}:[0-9]{2}| (.*)",line[0])
re.search("^.{21}(.*)",line[0])

But, how to use the group (.*) \\1 to only output that part ? 但是,如何使用组(。*)\\ 1仅输出该部分? Is even regex needed to do this ? 甚至需要使用正则表达式吗?

Here is whole code: I am a very beginner to python so the code is probably quite wrong: 这是完整的代码:我是python的初学者,所以代码可能很错误:

import re
with open("database.txt") as f: 
    pstring='' #previous line string beginning
    astring='' #actual line string beginning
    try:
        out = open(re.search("^[0-9]{2}-[0-9]{2}",line[0]) + ".zbs", "w")
        for line in f:
            astring = re.search("^[0-9]{2}-[0-9]{2}-",line[0])
            if not pstring = astring:
                out.write(line)
                pstring = re.search("^[0-9]{2}-[0-9]{2}-",line[0])
                if out: out.close()
                out = open(re.search("^[0-9]{2}-[0-9]{2}",line[0]) + ".zbs", "w")
            else: 
                pstring = re.search("^[0-9]{2}-[0-9]{2}-",line[0])
                out.write(line)
    finally:
        out.close()

Best regards. 最好的祝福。

Let's consider a single line in your file: 让我们考虑文件中的一行:

line = "01-01-1989-06:30:00| Stefan Reinartz; 1.1.1989; 06:30; +01; Engelskirchen,Germany; 50n59; 7e24; M;"

If you want to get rid of the first 21 characters of a line, then you can simply use what is referred to as slicing as follows: 如果要摆脱一行的前21个字符,则可以简单地使用所谓的切片,如下所示:

>>> print(line[21:])
Stefan Reinartz; 1.1.1989; 06:30; +01; Engelskirchen,Germany; 50n59; 7e24; M;

(Have a look at this site for more details about retrieving substrings via slicing.) (请访问站点,以获取有关通过切片检索子字符串的更多详细信息。)

Now, if you need to extract parts of such a line, then you can indeed make use of regular expressions. 现在,如果您需要提取这样的行的一部分,那么您确实可以使用正则表达式。 To get the parts of the date, as you mentioned, you can use, eg, a pattern with named groups as follows: 如前所述,要获取日期的各个部分,可以使用具有命名组的模式,如下所示:

import re
p = r"[^\;]+; (?P<day>[0-9]+)\.(?P<month>[0-9]+)\.(?P<year>[0-9]+)"
m = re.match(p, line)

The matched groups may then be accessed like this: 然后可以像这样访问匹配的组:

>>> m.group("day")
'1'
>>> m.group("month")
'1'
>>> m.group("year")
'1989'

(You can, of course, get the date more easily by extracting it right from the beginning of a line, but this is just an example that demonstrates the use of named group.) (当然,您可以通过从行首开始提取日期来更轻松地获取日期,但这只是一个示例,它演示了命名组的用法。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM