[英]How to modify a file to replace a string that matches this pattern
我有一个像这样的json文件:
{
"title": "Pilot",
"image": [
{
"resource": "http://images2.nokk.nocookie.net/__cb20110227141960/notr/images/8/8b/pilot.jpg",
"description": "not yet implemented"
}
],
"content": "<p>The pilot ...</p>"
},
{
"title": "Special Christmas (Part 1)",
"image": [
{
"resource": "http://images1.nat.nocookie.net/__cb20090519172121/obli/images/e/ed/SpecialChristmas.jpg",
"description": "not yet implemented"
}
],
"content": "<p>Last comment...</p>"
}
我需要替换文件中所有资源值的内容,因此如果字符串具有以下格式:
"http://images1.nat.nocookie.net/__cb20090519172121/obli/images/e/ed/SpecialChristmas.jpg"
结果应该是:
"../img/SpecialChristmas.jpg"
有人可以告诉我如何匹配该模式以修改文件?
我尝试过这样的建议:
https://stackoverflow.com/a/4128192/521728
但我不知道如何适应我的情况。
提前致谢!
如果它们都是"../img"
图像,我相信你可以这样做:
resourceVal = "http://images1.nat.nocookie.net/__cb20090519172121/obli/images/e/ed/SpecialChristmas.jpg"
lastSlash = resourceVal.rfind('/')
result = "../img" + resourceVal[lastSlash:]
如果有其他类型的资源,这可能会有点复杂 - 让我知道,我会尝试编辑这个答案来帮助。
这是我的答案,不是很简洁,但您可以将re.search(".jpg",line)
行中使用的正则表达式调整为您想要的任何正则表达式。
import re
with open("new.json", "wt") as out:
for line in open("test.json"):
match = re.search(".jpg",line)
if match:
sp_str = line.split("/")
new_line = '\t"resource":' + '"../img/'+sp_str[-1]
out.write(new_line)
else:
out.write(line)
我在组中使用正则表达式:
from StringIO import StringIO
import re
reader = StringIO("""{
"title": "Pilot",
"image": [
{
"resource": "http://images2.nokk.nocookie.net/__cb20110227141960/notr/images/8/8b/pilot.jpg",
"description": "not yet implemented"
}
],
"content": "<p>The pilot ...</p>"
},
{
"title": "Special Christmas (Part 1)",
"image": [
{
"resource": "http://images1.nat.nocookie.net/__cb20090519172121/obli/images/e/ed/SpecialChristmas.jpg",
"description": "not yet implemented"
}
],
"content": "<p>Last comment...</p>"
}""")
# to open a file just use reader = open(filename)
text = reader.read()
pattern = r'"resource": ".+/(.+).jpg"'
replacement = '"resource": "../img/\g<1>.jpg"'
text = re.sub(pattern, replacement, text)
print(text)
解释模式。 "resource": ".+/(.+)?.jpg"
:查找以"resource": "
开头的任何文本"resource": "
然后在正斜杠之前有一个或多个字符,然后在.jpg"
之前有一个或多个字符.jpg"
。 括号()
意味着我想要作为一个组内部找到的东西。 由于我只有一组括号,我可以用'\\g<1>'
代替我。 (注意'\\g<0>'
将匹配整个字符串: '
“resources”:etc'`)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.