简体   繁体   English

使用谷歌应用程序脚本从字符串中提取所有链接

[英]Extract all links from a string with google app script

i have an string variable with links inside (among other text), and i want to be able to extract all links containing a certain patron (like containing the word 'case')... is this possible to do?我有一个字符串变量,里面有链接(以及其他文本),我希望能够提取包含某个顾客的所有链接(比如包含“case”这个词)......这可能吗?

Variable string is something like:可变字符串是这样的:

var string = 'here is some text line among the ones there will be links like https://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script?noredirect=1#comment68679843_40725199 and more';

As a workaround, i used what described here: extract links from document , to create a document with the string as content and then extract the links, but i would like to do it directly...作为一种解决方法,我使用了此处描述的内容: 从文档中提取链接,以字符串作为内容创建文档,然后提取链接,但我想直接执行此操作...

Regards,问候,

EDIT (To Ruben):编辑(对鲁本):

If i use:如果我使用:

var string = 'http://mangafox.me/manga/tales_of_demons_and_gods/c105/1.html here is some text line among the ones there will be links like https://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script?noredirect=1#comment68679843_40725199 and more ';

I got only the first link twice (see screenshot here ).我只得到了第一个链接两次(请参阅此处的屏幕截图)。

And if i use:如果我使用:

var string = 'here is some text line among the ones there will be links like https://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script?noredirect=1#comment68679843_40725199 and more http://mangafox.me/manga/tales_of_demons_and_gods/c105/1.html ';

The same again (see screenshoot here ).再次相同(请参阅此处的屏幕截图)。

Google Apps Script Google Apps脚本

function test2(){
  var re = /\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'"".,<>?«»“”‘’]))/i;
  var string = 'here is some text line among the ones there will be links like https://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script?noredirect=1#comment68679843_40725199 and more';
  for(var i = 0; i <= re.exec(string).length; i++){
    if(re.exec(string)[i]) Logger.log(re.exec(string)[i]) 
  }
}

JavaScript. JavaScript。

 var re = /\\b((?:[az][\\w-]+:(?:\\/{1,3}|[a-z0-9%])|www\\d{0,3}[.]|[a-z0-9.\\-]+[.][az]{2,4}\\/)(?:[^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))+(?:\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:'"".,<>?«»“”'']))/i; var string = 'here is some text line among the ones there will be links like https://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script?noredirect=1#comment68679843_40725199 and more here is some text line among the ones there will be links like https://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script?noredirect=1#comment68679843_40725199 and more'; for(var i = 0; i <= re.exec(string).length; i++){ if(re.exec(string)[i]) console.log(re.exec(string)[i]) } 

Reference 参考

RegularExpression to Extract Url For Javascript RegularExpression提取Javascript网址

If you're only getting the first match then I think you need the 'g' flag on the Regular Expression to capture all matches, then each call to exec() will return the next match.如果您只获得第一个匹配项,那么我认为您需要正则表达式上的 'g' 标志来捕获所有匹配项,然后每次调用 exec() 都将返回下一个匹配项。 I'm using:我正在使用:

const re = /(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)(?:\([-A-Z0-9+&@#\/%=~_|$?!:,.]*\)|[-A-Z0-9+&@#\/%=~_|$?!:,.])*(?:\([-A-Z0-9+&@#\/%=~_|$?!:,.]*\)|[A-Z0-9+&@#\/%=~_|$])/igm;

while ((reResults = re.exec(s)) !== null) { //finds next match
      Logger.log(reResults[0]); //result of next match
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM