在Python中使用正则表达式更改文件中字符串的一部分

Question

I have a file in which each line contains one timestamp as a part of that line. 我有一个文件，其中每一行包含一个时间戳作为该行的一部分。 The timestamp format is 1996-07-04 00:00:00.0 . 时间戳格式为1996-07-04 00:00:00.0 。 I want to convert this to 1996-07-04 00:00:00 without the millisecond in each line. 我想将其转换为1996-07-04 00:00:00而每行中都没有毫秒。 I tried using re.sub() method in pyhton but it replaces it with the value given by me and does not retain the original timestamp. 我尝试在pyhton中使用re.sub()方法，但是它将它替换为我提供的值，并且不保留原始时间戳。 I was using 我在用

re.sub("(\d\d\d\d-\d\d-\d\d\s+\d\d:\d\d:\d\d.\d)", "replace without millisec", cell)

The 2nd parameter is my problem. 第二个参数是我的问题。

Answer 1

You can use the following regex that will capture what you need to keep, and then use the backreference to restore it after a sub replacement: 您可以使用以下正则表达式捕获您需要保留的内容，然后在子替换后使用后向引用将其还原：

\b(\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2})\.\d+\b

Replace with \\1 . 替换为\\1 。

See demo 观看演示

IDEONE code: IDEONE代码：

import re
p = re.compile(r'\b(\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2})\.\d+\b')
test_str = u"1996-07-04 00:00:00.0"
print re.sub(p, r"\1", test_str)

Note that you do not have to repeat the same subpatterns like \\d\\d\\d\\d , just use a limiting quantifier {n} where n is the number of times you need the subpattern to repeat. 请注意，您不必重复\\d\\d\\d\\d类的相同子模式，只需使用限制量词 {n} ，其中n是您需要该子模式重复的次数。 You can even set minimum and maximum boundaries like {1,4} , or just the minimum {2,} . 您甚至可以设置最小和最大边界，例如{1,4} ，或者仅设置最小{2,} 。

在Python中使用正则表达式更改文件中字符串的一部分

问题描述

1 个解决方案

解决方案1
4 已采纳 2015-05-21 09:02:42

在Python中使用正则表达式更改文件中字符串的一部分

问题描述

1 个解决方案

解决方案1 4 已采纳 2015-05-21 09:02:42

解决方案1
4 已采纳 2015-05-21 09:02:42