简体   繁体   English

在Python中使用正则表达式更改文件中字符串的一部分

[英]Change part of string in a file using regex in Python

I have a file in which each line contains one timestamp as a part of that line. 我有一个文件,其中每一行包含一个时间戳作为该行的一部分。 The timestamp format is 1996-07-04 00:00:00.0 . 时间戳格式为1996-07-04 00:00:00.0 I want to convert this to 1996-07-04 00:00:00 without the millisecond in each line. 我想将其转换为1996-07-04 00:00:00而每行中都没有毫秒。 I tried using re.sub() method in pyhton but it replaces it with the value given by me and does not retain the original timestamp. 我尝试在pyhton中使用re.sub()方法,但是它将它替换为我提供的值,并且不保留原始时间戳。 I was using 我在用

re.sub("(\d\d\d\d-\d\d-\d\d\s+\d\d:\d\d:\d\d.\d)", "replace without millisec", cell)

The 2nd parameter is my problem. 第二个参数是我的问题。

You can use the following regex that will capture what you need to keep, and then use the backreference to restore it after a sub replacement: 您可以使用以下正则表达式捕获您需要保留的内容,然后在子替换后使用后向引用将其还原:

\b(\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2})\.\d+\b

Replace with \\1 . 替换为\\1

See demo 观看演示

IDEONE code: IDEONE代码:

import re
p = re.compile(r'\b(\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2})\.\d+\b')
test_str = u"1996-07-04 00:00:00.0"
print re.sub(p, r"\1", test_str)

Note that you do not have to repeat the same subpatterns like \\d\\d\\d\\d , just use a limiting quantifier {n} where n is the number of times you need the subpattern to repeat. 请注意,您不必重复\\d\\d\\d\\d类的相同子模式,只需使用限制量词 {n} ,其中n是您需要该子模式重复的次数。 You can even set minimum and maximum boundaries like {1,4} , or just the minimum {2,} . 您甚至可以设置最小和最大边界,例如{1,4} ,或者仅设置最小{2,}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM