如何仅从python中的文本文件获取基本URL？

Question

I have a text file name weburl which have many urls I want to get only base urls using regex weburls 我有一个文本文件名weburl ，其中有许多网址，我只想使用正则表达式weburl来获取基本网址

 wikimapia.org/1649944/Bahawalpur-Railway-Station
 panoramio.com/photo/84118355
 wikimapia.org/1649944/Bahawalpur-Railway-Station
 nativepakistan.com/photos-of-bahawalpur
 defence.pk/threads/pictures-of-pakistan-railways.303027
 nativepakistan.com/photos-of-bahawalpur
 panoramio.com/photo/51311162
 https://hiveminer.com/User/Pakistan Rail Buff

need this 需要这个

 wikimapia.org
 panoramio.com
 wikimapia.org
 nativepakistan.com
 defence.pk
 nativepakistan.com
 panoramio.com
 https://hiveminer.com

Using regex how can i do it? 使用正则表达式我该怎么办？

Answer 1

One solution could be: 一种解决方案可能是：

^(?:\w+://)?.*?(?::\d+)?(?=/|$)

It matches begining of line ( ^ ) followed by an optional protocol specification, eg https:// ( (?:\\w+://)? ). 它与行（ ^ ）开头，后跟可选协议规范，例如https:// （ (?:\\w+://)? ）。 Then it matches any number of anything ( .*? ) up to an optional port specification - like :80 ( (?::\\d+)? ). 然后，它匹配任意数量的任何内容 （ .*? ）直至可选的端口规范-例如:80 （ (?::\\d+)? ）。 Finally it checks that the match is followed by a / or an end of line $ (the psitive look ahead (?=/|$) ). 最后，它检查匹配项后是否跟有/ 或行尾$ （向前看(?=/|$) ps (?=/|$) ）。

Check it out here at regex101 . 在regex101处检查。

Note that if you don't want to match the port part, you could move it in to the positive look ahead. 请注意，如果您不想匹配端口部分，则可以将其移至正面。 Ie ^(?:\\w+://)?.*?(?=(?::\\d+)?(?:/|$)) 即^(?:\\w+://)?.*?(?=(?::\\d+)?(?:/|$))

如何仅从python中的文本文件获取基本URL？

问题描述

1 个解决方案

解决方案1
0 2017-04-12 12:18:59

如何仅从python中的文本文件获取基本URL？

问题描述

1 个解决方案

解决方案1 0 2017-04-12 12:18:59

解决方案1
0 2017-04-12 12:18:59