简体   繁体   English

消除错误 <br> 来自textarea的标签

[英]Remove errant <br> tags from textarea

I'm using the textarea rich text editor text editor in SharePoint 2013 and it has a an annoying habit of adding extra break tags into the behind-the-scene html markup at the end of tags like this: 我在SharePoint 2013中使用了textarea富文本编辑器文本编辑器,它有一个令人讨厌的习惯,将额外的break标记添加到幕后html标记中,像这样在标记的末尾:

<h1>Some heading<br></h1>
<p>Intro paragraph with maybe an actual.<br>That is supposed to be here.</p>
<ul>
   <li>Item 1</li>
   <li>Item 2</li>
   <br>
   <br>
</ul>

In this example the br in the middle of the paragraph is one that the user inserted, but the ones right at the end of the h1 and ul tags are not desirable and I would like to remove them. 在此示例中,段落中间的br是用户插入的br,但是不希望在h1和ul标签末尾使用br,我希望将其删除。 I can't think of a case where a 我想不出
right before another closing tag is valid, so that is my plan. 就在另一个结束标记有效之前,所以这是我的计划。

I would like to find all br tags immediately before any other closing tag and remove them. 我想在所有其他结束标记之前立即找到所有br标记,并将其删除。

We could use vanilla javascript, but jQuery is already on the page for other things. 我们可以使用香草javascript,但是jQuery已经在页面上用于其他功能。

I found this thread which provides a regex solution to remove br right before a closing h2. 我发现此线程提供了一个正则表达式解决方案,可以在结束h2之前删除br。 It is php and provides more the algorithm than an implementation. 它是php,提供的算法多于实现。 There is a second solution there to "use a DOM parser." 那里还有第二种解决方案,“使用DOM解析器”。 But I am not familiar with that. 但是我对此并不熟悉。

Additionally some of the added tags are <br> and some are <br /> . 另外,一些添加的标签是<br> ,有些是<br /> And there may or may not be line returns and spaces. 并且可能有也可能没有行返回和空格。

Is there a method for finding all <br> or <br /> immediately before (ignoring any line returns or white spaces) any other valid closing tag? 是否有一种方法可以在查找其他所有有效的结束标记之前(忽略任何行返回或空格)而立即查找所有<br><br />

Using jQuery to cover the cases shown. 使用jQuery覆盖显示的情况。 Can add to it as you find other cases not covered 可以添加到其中,发现其他未涵盖的情况

 // get html string from main editor and put in temporary div const $html = $('<div>').append($('#editor').html()) let ctr = 0; // counter for demo/debugging only // hunt for unwanted culprits $html.find('br').each(function() { const $br = $(this); // remove at end of parent OR more than one together OR is in a UL as child if (!this.nextSibling || $br.next().is('br') || $br.parent().is('ul')) { ctr++ this.remove(); } }) console.log('removed =', ctr) console.log($html.html()) 
 <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script> <div id="editor"> <h1>Some heading<br></h1> <p>Intro paragraph with maybe an actual.<br>That is supposed to be here.</p> <ul> <li>Item 1</li> <li>Item 2</li> <br> <br> </ul> </div> 

If the HTML there is in a string, a simple RegEx replacement can remove what you want: 如果字符串中包含HTML,则简单的RegEx替换即可删除您想要的内容:

htmlSourceCodeVar = htmlSourceCodeVar.replace(/<br(?: \/)?>(<\/)/ig, '$1');

What the RegEx matches is all <br , followed optionally by / , followed by ></ ; 什么正则表达式匹配是所有<br任选接着/接着></ ; it then replaces it with the beginning of that closing tag, thus removing the break. 然后将其替换为该结束标记的开头,从而删除中断。 You can also do it without backreferences in this case, since the start of a closing tag is constant and known: 在这种情况下,您也可以在没有反向引用的情况下执行此操作,因为结束标记的开始是恒定的并且是已知的:

htmlSourceCodeVar = htmlSourceCodeVar.replace(/<br(?: \/)?><\//ig, '</');

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM