简体   繁体   English

如何在 Python 中的一行中重新组织 html 标签

[英]How to reorganize html tags in one line in Python

I am trying to reorganize html tags so that it appears that each tag in in new line so it's human readable.我正在尝试重新组织 html 标签,以便每个标签都出现在新行中,因此它是人类可读的。

I have an input file like this我有一个这样的输入文件

<?xml version="1.0" encoding="UTF-8"?>
<tt xmlns="http://www.w3.org/ns/ttml" xml:lang="en" xmlns:tts="http://www.w3.org/ns/ttml#parameter"><head><styling><style id="b1"/></styling></head><body><div xml:lang="en" style="b1"><p begin="" end="0.25">BLACK</p><p begin="0.25" end="0.5">BLACK CAUCUS</p><p begin="0.5" end="0.75">BLACK CAUCUS WHO</p><p begin="0.75" end="2">BLACK CAUCUS WHO ALSO</p><p begin="2" end="2.25">BLACK CAUCUS WHO ALSO REACTED</p><p begin="2.25" end="2.5">BLACK CAUCUS WHO ALSO REACTED TO</p><p begin="2.5" end="2.75">BLACK CAUCUS WHO ALSO REACTED TO<br/>THE</p><p     begin="2.75" end="3">BLACK CAUCUS WHO ALSO REACTED TO<br/>THE JUSTICE</p><p begin="3" end="3.5">BLACK CAUCUS WHO ALSO REACTED TO<br/>THE JUSTICE DEPARTMENT</p><p begin="3.5" end="4">BLACK CAUCUS WHO ALSO REACTED TO<br/>THE JUSTICE DEPARTMENT<br/>INVESTIGATION</p><p begin="4" end="4.25">THE JUSTICE DEPARTMENT<br/>INVESTIGATION THAT</p><p begin="4.25" end="4.5">THE JUSTICE DEPARTMENT<br/>INVESTIGATION THAT FOUND</p><p begin="4.5" end="4.75">THE JUSTICE DEPARTMENT<br/>INVESTIGATION THAT FOUND THERE</p><p begin="4.75" end="5">THE JUSTICE DEPARTMENT<br/>INVESTIGATION THAT FOUND THERE<br/>IS</p><p begin="5" end="5.333">THE JUSTICE DEPARTMENT<br/>INVESTIGATION THAT FOUND THERE<br/>IS RACIAL</p><p begin="5.333" end="5.667">INVESTIGATION THAT FOUND THERE<br/>IS RACIAL BIAS</p><p begin="5.667" end="6">INVESTIGATION THAT FOUND THERE<br/>IS RACIAL BIAS WITHIN</p><p begin="6" end="7">INVESTIGATION THAT FOUND THERE<br/>IS RACIAL BIAS WITHIN THE</p><p begin="7" end="7.5">INVESTIGATION THAT FOUND THERE<br/>IS RACIAL BIAS WITHIN THE<br/>FERGUSON,</p><p begin="7.5" end="8">IS RACIAL BIAS WITHIN THE<br/>FERGUSON, MISSOURI,</p><p begin="8" end="8.25">IS RACIAL BIAS WITHIN THE<br/>FERGUSON, MISSOURI, POLICE</p><p begin="8.25" end="8.5">IS RACIAL BIAS WITHIN THE<br/>FERGUSON, MISSOURI, POLICE<br/>DEPARTMENT.</p><p begin="8.5" end="8.75">FERGUSON, MISSOURI, POLICE<br/>DEPARTMENT.<br/>WE</p><p begin="8.75" end="9">FERGUSON, MISSOURI, POLICE<br/>DEPARTMENT.<br/>WE BEGIN</p><p begin="9" end="9.5"    >FERGUSON, MISSOURI, POLICE<br/>DEPARTMENT.<br/>WE BEGIN WITH</p><p begin="9.5" end="10">FERGUSON, MISSOURI, POLICE<br/>DEPARTMENT.<br/> WE BEGIN WITH REMARKS</p><p begin="10" end="10.5">DEPARTMENT.<br/>WE BEGIN WITH REMARKS FROM</p><p begin="10.5" end="11">DEPARTMENT.<br/>WE BEGIN WITH REMARKS FROM<br/>C.B.C.</p><p begin="11" end="11.333">WE BEGIN WITH REMARKS FROM<br/>C.B.C.<br/>CHAIR</p><p begin="11.333" end="11.667">WE BEGIN WITH REMARKS FROM<br/>C.B.C.<br/>CHAIR G.K.</p><p begin="11.667" end="12">C.B.C.<br/>CHAIR G.K.<br/>BUTTERFIELD</p><p begin="12" end="12.333">C.B.C.<br/>CHAIR G.K.<br/>BUTTERFIELD OF</p><p begin="12.333" end="12.667">C.B.C.<br/>CHAIR G.K.<br/>BUTTERFIELD OF NORTH</p><p begin="12.667" end="13">C.B.C.<br/>CHAIR G.K.<br/>BUTTERFIELD OF NORTH CAROLINA.</p><p begin="13" end="13.25">CHAIR G.K.<br/>BUTTERFIELD OF NORTH CAROLINA.<br/>THIS</p><p begin="13.25" end="13.5">CHAIR G.K.<br/>BUTTERFIELD OF NORTH CAROLINA.<br/>THIS IS</p>

an expected output is below预期输出低于

<?xml version="1.0" encoding="UTF-8"?>
<tt xmlns="http://www.w3.org/ns/ttml" xml:lang="en" xmlns:tts="http://www.w3.org/ns/ttml#parameter">
<head>
<styling><style id="b1"/></styling>
</head>
<body>
...
<p begin="5" end="5.333">THE JUSTICE DEPARTMENT<br/>INVESTIGATION THAT FOUND THERE<br/>IS RACIAL</p>
<p begin="5.333" end="5.667">INVESTIGATION THAT FOUND THERE<br/>IS RACIAL BIAS</p>
<p begin="5.667" end="6">INVESTIGATION THAT FOUND THERE<br/>IS RACIAL BIAS WITHIN</p>
<p begin="6" end="7">INVESTIGATION THAT FOUND THERE<br/>IS RACIAL BIAS WITHIN THE</p>
...
and so on

How do I accomplish this task?我如何完成这个任务?

You can prettify the html using Beautifulsoup您可以使用Beautifulsoup美化 html

from bs4 import BeautifulSoup
soup = BeautifulSoup(data) #string format
prettyHTML = soup.prettify()   
print(prettyHTML)

Output:输出: 在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM