I want to replace a whole code structure using bs4, I have a source html and a target html
t_soup = BeautifulSoup(target_html, 'html.parser')
s_soup = BeautifulSoup(source_html, 'html.parser')
The first code is in the target:
//Block of code number 0
<div class="td-module-thumb">
//Some html code
</div>
//Block of code number 1
<div class="td-module-thumb">
<a href="this_is_an_url.html" rel="bookmark" class="image-wrap">
<img width="356" height="220" class="entry-thumb" src="../my_image-356x220.jpg" >
</a>
</div>
and I want to replace what is contained in the target, specifically in block[1] for what is contained in the source in block[1], that is:
//Block of code number 0
<div class="td-module-thumb">
//Some html code
</div>
//Block of code number 1
<div class="td-module-thumb">
<a href="another_url.html" rel="bookmark_2" class="new-class">
<img width="356" height="220" class="exit" src="../other_image_here.jpg" >
</a>
</div>
They have the same <div class="td-module-thumb">
My code to make the replacements is the following:
left_column_selector = 'div.td-module-thumb'
left_column = s_soup.select(left_column_selector)[1]
NOTE:
>>> type(s_soup.select(left_column_selector)[1])
<class 'bs4.element.Tag'>
This are different tries of my last line of code, the one that actually makes the replacement:
// #1
t_soup.select(left_column_selector)[1].replace_with(str(left_column))
// #2
t_soup.select(left_column_selector)[1].string.replace_with(left_column)
// #3
t_soup.select(left_column_selector)[1].string.replace_with(left_column.string)
// #4
t_soup.select(left_column_selector)[1].replace_with(left_column.string)
Everything works fine except the las line of code. In consequence the code in target is not getting replaced for source
I would do it wholesale, as it were - delete target, insert source:
selector = 'div.td-module-thumb'
to_graft = s_soup.select(selector)[0]
for div in t_soup.select(selector):
div.decompose()
t_soup.select_one('doc').insert(1, to_graft)
Edit:
Let's say your files look like this:
target = """<root> I am the target
<div class="td-module-thumb">
don't touch me!
</div>
<div class="td-module-thumb">
replace me!
<a href="this_is_an_url.html" rel="bookmark" class="image-wrap">
<img width="356" height="220" class="entry-thumb" src="../my_image-356x220.jpg" >
</a>
</div>
</root>
"""
source = """<root><div class="td-module-thumb">
I'm the irrelevant part of the source
</div>
<div class="td-module-thumb">
move me to target!
<a href="another_url.html" rel="bookmark_2" class="new-class">
<img width="356" height="220" class="exit" src="../other_image_here.jpg" >
</a>
</div>
</root>
"""
Then this should do it:
t_soup = bs(target,'lxml')
s_soup = bs(source,'lxml')
selector = 'div.td-module-thumb'
to_graft = s_soup.select(selector)[1]
to_remove = t_soup.select(selector)
to_remove[1].decompose()
t_soup.select_one('root').insert(2, to_graft)
t_soup
Output:
<root> I am the target
<div class="td-module-thumb">
don't touch me!
</div><div class="td-module-thumb">
move me to target!
<a class="new-class" href="another_url.html" rel="bookmark_2">
<img class="exit" height="220" src="../other_image_here.jpg" width="356"/>
</a>
</div>
</root>
You can do it by performing Javascript. Import Pyv8 and change the parent's container innerHTML value. The script below returns true if the replacement worked.
import PyV8
ctx = PyV8.JSContext()
ctx.enter()
print ctx.eval(document.getELementById('#parentContainer').innerHTML = newhtml; if (document.getELementById('#parentContainer').innerHTML == newhtml) {return true;} else {return false;})
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.