简体   繁体   中英

Removing BBCode URL tag from string

Im trying to make a stable system that will allow users to paste any mixture of BB / Html code into an input and i will sanitize and strip the data AS I WANT.

The content is copied from forums and the issue is that they all seems to use different code. Some display more than one
some use a self closing br tag. Others use a [URL =] And other just use [URL]URL[/URL] etc.

So far, I use HTMLpurifier to strip everything except for img tags.

HTMLpurifier doesnt (as far as i can see) remove BBCode. So, given a string like so:

[URL=http://awebsite.com]My Link [IMG]imagelink.png[/IMG][/URL]

How can i remove the URL tags and just leave the IMG tags.

I want to remove all the URL tag options so the url given and the text as well which may prove difficult.

So far i have got quite far by converting [IMG] tags etc using REGEX which works but i feel there are too many variants to hardcode this.

Any suggestions on a more efficient way / possible way to remove the URL tags?

Option 1

If you just want to remove tags such as [URL=http://awebsite.com] and [/URL] , leaving the content inside, the regex is simple:

Search: \\[/?URL[^\\]]*\\]

Replace: Empty string

In JavaScript

replaced = string.replace(/\[\/?URL[^\]]*\]/g, "");

In PHP

$replaced = preg_replace('%\[/?URL[^\]]*\]%', '', $str);

Option 2: Also Removing content such as MyLink

Here, we'll replace the content following [URL...] that is not another tag.

Search: \\[URL[^\\]]*\\][^\\[\\]]*|\\[/URL[^\\]]*\\]

Replace: Empty string

JavaScript:

replaced = string.replace(/\[URL[^\]]*\][^\[\]]*|\[\/URL[^\]]*\]/g, "");

PHP:

$replaced = preg_replace('%\[URL[^\]]*\][^\[\]]*|\[/URL[^\]]*\]%', '', $str);

A solution could be to extract only IMG tags using regex:

$pattern ="#\[IMG\](https?://[-\w\.]+(:\d+)?/[\w/_\.]*(\?\S+?)?)?\[\/IMG\]#";
$str = "[URL=http://awebsite.com]My Link [IMG]http://google.com/imagelink.png[/IMG][/URL]";
preg_match($pattern, $str, $matches);
print_r($matches);

Result:

Array
(
    [0] => [IMG]http://google.com/imagelink.png[/IMG]
    [1] => http://google.com/imagelink.png
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM