I'm trying to develop a word counting application that supports .pdf, .docx, .doc, .txt, ..etc documents and I was able to read .doc files with PHP and load the plain text to a variable.
I'm using following code to remove extra white spaces of the string.
$str = trim(preg_replace('/\s+/', ' ', $str));
My issue is: Word documents with hyperlinks are phrasing as Some dummy text here.. HYPERLINK "http://domain.com/directory/page" other dummy text is here..
So I want to remove that HYPERLINK "http://domain.com/directory/page"
part or replace with a space or something.
Since I'm not a regular expression expert, I'm looking for help to solve this problem. Thanks!
HYPERLINK " http://domain.com/directory/page " will be matched by:
HYPERLINK "[^"]*"
Hyperlink, then quote, then anything but quote, then quote.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.