简体   繁体   English

如何从不规则命名的字符串中提取网站名称?

[英]How can I extract website names from irregularly named strings?

I have a column that looks like: 我有一列看起来像:

this contains no website
this is a web site.io
another websi te.co

I want to create a column that looks like: 我想创建一个看起来像这样的列:

NULL
site
te

So in the case of no period, it should return NULL, but if there is a period it should return the thing between the period and the previous space. 因此,在没有句点的情况下,它应该返回NULL,但是如果有句点,则应该在句点和前一个空格之间返回事物。

You can use a positive lookahead like this: 您可以像这样使用正向前瞻:

\S+(?=\.\S+)

The first \\S+ is what you want and the lookahead (?=\\.\\S+) is the suffix (eg .com , .org , .net , etc.). 第一个\\S+是您想要的,后缀( (?=\\.\\S+)是后缀(例如.com.org.net等)。

You could go for: 您可以申请:

\b(\w+)\.(?:io|co)\b

See a demo on regex101.com . 参见regex101.com上的演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM