[英]Python re.sub replace html attributes
I'm trying to resize images from html code. 我正在尝试从html代码调整图像大小。 This is one example:
这是一个例子:
My goal is to substitute " height="108"
" and " width="150"
with height and width 400. I've tried the following lines, though they don't seem to work: 我的目标是用高度和宽度400替换
" height="108"
”和" width="150"
。我尝试了以下几行,尽管它们似乎不起作用:
re.sub(r'width="[0-9]{2,4}"','width="400"',x)
re.sub(r'height="[0-9]{2,4}"','height="400"',x)
Does anyone have a solution for this? 有人有解决方案吗? Ps: I'm not that good at Regex... :)
Ps:我对Regex并不擅长...... :)
The reason it does not work is because strings are immutable , and you do not process the result. 它不起作用的原因是因为字符串是不可变的 ,并且您不处理结果。 You can " solve " the issue with:
您可以通过以下方式“ 解决 ”此问题:
x = re.sub(r'width="[0-9]{2,4}"','width="400"',x)
x = re.sub(r'height="[0-9]{2,4}"','height="400"',x)
That being said it is a very bad idea to process HTML/XML with regexes . 据说用正则表达式处理HTML / XML是一个非常糟糕的主意 。 Say you have a tag
<foo altwidth="1234">
. 假设您有一个标签
<foo altwidth="1234">
。 Now you will change it to <foo altwidth="400">
do you want that? 现在你将它改为
<foo altwidth="400">
你想要吗? Probably not. 可能不是。
You can for instance use BeautifulSoup : 例如,你可以使用BeautifulSoup :
soup = BeautifulSoup(x,'lxml')
for tag in soup.findAll(attrs={"width":True})
tag.width = 400
for tag in soup.findAll(attrs={"height":True})
tag.height = 400
x = str(soup)
Here we substitute all tags with a width
attribute to width="400"
and all tags with a height
with height="400"
. 在这里,我们将所有
width
属性的标签替换为width="400"
,所有标签的height
height="400"
。 You can make it more advanced by for instance only accepting <img>
tags , like: 您可以通过例如仅接受
<img>
标签使其更高级,例如:
soup = BeautifulSoup(x,'lxml')
for tag in soup.findAll(
'img',attrs={"width":True})
tag.width = 400
for tag in soup.findAll(
'img',attrs={"height":True})
tag.height = 400
x = str(soup)
Seems to be working fine: 似乎工作正常:
>>> x = '<foo width="150" height="108">'
>>> import re
>>> y = re.sub(r'width="[0-9]{2,4}"','width="400"',x)
>>> y
'<foo width="400" height="108">'
Note that re.sub
does not mutate x: 请注意,
re.sub
不会改变x:
>>> x
'<foo width="150" height="108">'
>>> y
'<foo width="400" height="108">'
Perhaps you want to do this instead: 也许你想这样做:
x = re.sub(r'width="[0-9]{2,4}"','width="400"',x)
x = re.sub(r'height="[0-9]{2,4}"','height="400"',x)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.