[英]How to get only a part of a line using regex group in python?
How to get only a part of a line using regex group in python ? 如何在python中使用正则表达式组仅获取行的一部分? I have a database of one entry per line and I want to split it into files according to month and day data at the beginning of a line but I want only to output a line without first 21 characters.
我有一个每行一个条目的数据库,我想根据行首的月和日数据将其拆分为文件,但是我只想输出不包含前21个字符的行。 Here is a quick sample of the database:
这是数据库的快速示例:
01-01-1989-06:30:00| Stefan Reinartz; 1.1.1989; 06:30; +01; Engelskirchen,Germany; 50n59; 7e24; M;
01-01-1996-08:40:00| Dawid Kwiatkowski; 1.1.1996; 08:40; +01; Gorzów Wielkopolski,Poland; 52n44; 15e15; M;
01-01-2001-01:30:00| Liam Flockhart; 1.1.2001; 01:30; -08; San Diego,California; 32n43; 117w09; M;
01-02-1467-00:20:00| King of Poland Sigismund I the Old; 2.1.1467; 00:20; +00:21:33; Kozienice,Poland; 51n35; 21e33; M;
01-02-1746-09:00:00| Duke of Rambouillet Louis Marie; 2.1.1746; 09:00; -00:03:41; Madrid,Spain; 40n24; 3w41; M;
01-02-1784-01:00:00| Duke of Saxe-Coburg and Gotha Ernst I; 2.1.1784; 01:00; +00:10:58; Coburg,Germany; 50n15; 10e58; M;
Desired output File 01-01.zbs: 所需的输出文件01-01.zbs:
Stefan Reinartz; 1.1.1989; 06:30; +01; Engelskirchen,Germany; 50n59; 7e24; M;
Dawid Kwiatkowski; 1.1.1996; 08:40; +01; Gorzów Wielkopolski,Poland; 52n44; 15e15; M;
Liam Flockhart; 1.1.2001; 01:30; -08; San Diego,California; 32n43; 117w09; M;
Output File 01-02.zbs: 输出文件01-02.zbs:
King of Poland Sigismund I the Old; 2.1.1467; 00:20; +00:21:33; Kozienice,Poland; 51n35; 21e33; M;
Duke of Rambouillet Louis Marie; 2.1.1746; 09:00; -00:03:41; Madrid,Spain; 40n24; 3w41; M;
Duke of Saxe-Coburg and Gotha Ernst I; 2.1.1784; 01:00; +00:10:58; Coburg,Germany; 50n15; 10e58; M;
I used the beginning to sort them by each day of the year and to split the file accordingly. 我用开始按一年中的每一天对它们进行排序,并相应地拆分文件。 But I don't want to output the first 21 chars of each line so I am trying to use regex group to do this, like this:
但是我不想输出每行的前21个字符,所以我试图使用regex group来做到这一点,就像这样:
re.search("^[0-9]{2}-[0-9]{2}-[0-9]{4}-[0-9]{2}:[0-9]{2}:[0-9]{2}| (.*)",line[0])
re.search("^.{21}(.*)",line[0])
But, how to use the group (.*) \\1 to only output that part ? 但是,如何使用组(。*)\\ 1仅输出该部分? Is even regex needed to do this ?
甚至需要使用正则表达式吗?
Here is whole code: I am a very beginner to python so the code is probably quite wrong: 这是完整的代码:我是python的初学者,所以代码可能很错误:
import re
with open("database.txt") as f:
pstring='' #previous line string beginning
astring='' #actual line string beginning
try:
out = open(re.search("^[0-9]{2}-[0-9]{2}",line[0]) + ".zbs", "w")
for line in f:
astring = re.search("^[0-9]{2}-[0-9]{2}-",line[0])
if not pstring = astring:
out.write(line)
pstring = re.search("^[0-9]{2}-[0-9]{2}-",line[0])
if out: out.close()
out = open(re.search("^[0-9]{2}-[0-9]{2}",line[0]) + ".zbs", "w")
else:
pstring = re.search("^[0-9]{2}-[0-9]{2}-",line[0])
out.write(line)
finally:
out.close()
Best regards. 最好的祝福。
Let's consider a single line in your file: 让我们考虑文件中的一行:
line = "01-01-1989-06:30:00| Stefan Reinartz; 1.1.1989; 06:30; +01; Engelskirchen,Germany; 50n59; 7e24; M;"
If you want to get rid of the first 21 characters of a line, then you can simply use what is referred to as slicing as follows: 如果要摆脱一行的前21个字符,则可以简单地使用所谓的切片,如下所示:
>>> print(line[21:])
Stefan Reinartz; 1.1.1989; 06:30; +01; Engelskirchen,Germany; 50n59; 7e24; M;
(Have a look at this site for more details about retrieving substrings via slicing.) (请访问此站点,以获取有关通过切片检索子字符串的更多详细信息。)
Now, if you need to extract parts of such a line, then you can indeed make use of regular expressions. 现在,如果您需要提取这样的行的一部分,那么您确实可以使用正则表达式。 To get the parts of the date, as you mentioned, you can use, eg, a pattern with named groups as follows:
如前所述,要获取日期的各个部分,可以使用具有命名组的模式,如下所示:
import re
p = r"[^\;]+; (?P<day>[0-9]+)\.(?P<month>[0-9]+)\.(?P<year>[0-9]+)"
m = re.match(p, line)
The matched groups may then be accessed like this: 然后可以像这样访问匹配的组:
>>> m.group("day")
'1'
>>> m.group("month")
'1'
>>> m.group("year")
'1989'
(You can, of course, get the date more easily by extracting it right from the beginning of a line, but this is just an example that demonstrates the use of named group.) (当然,您可以通过从行首开始提取日期来更轻松地获取日期,但这只是一个示例,它演示了命名组的用法。)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.