简体   繁体   中英

How to extract words between certain character

Here's my datasets

      domainname
0     address=/000007.ru/0.0.0.0
1     address=/000007.ru/::
2     address=/000free.us/0.0.0.0
3     address=/000free.us/::

I want to extract word between / and / so the desired output is

      domainname                        website
0     address=/000007.ru/0.0.0.0        000007.ru
1     address=/000007.ru/::             000007.ru
2     address=/000free.us/0.0.0.0       000free.us
3     address=/000free.us/::            000free.us

Here's what I try

adsdata_vector = df["domainname"]
ads = []
for i in range(len(adsdata)):
   ads.append(re.split(r"[/]+",adsdata_vector[i]))
ads[0:4]

Here's what I get

[['address=', '000007.ru', '0.0.0.0'],
['address=', '000007.ru', '::'],
['address=', '000free.us', '0.0.0.0'],
['address=', '000free.us', '::']]

I only want second column only, please suggest something?

如果地址始终是address=/000007.ru/0.0.0.0并且您每次都想提取第二列,为什么不使用:

website = address.split('/')[1]

You can use Series.str.extract :

df['website'] = df.domainname.str.extract(r'/(.+)/')

      domainname                        website
0     address=/000007.ru/0.0.0.0        000007.ru
1     address=/000007.ru/::             000007.ru
2     address=/000free.us/0.0.0.0       000free.us
3     address=/000free.us/::            000free.us

The regex r'/(.+)/' will find any character repeated one or more times between two /

If want extract first matched values use Series.str.extract :

df['website'] = df['domainname'].str.extract('/(.*?)/')
print (df)
                    domainname     website
0   address=/000007.ru/0.0.0.0   000007.ru
1        address=/000007.ru/::   000007.ru
2  address=/000free.us/0.0.0.0  000free.us
3       address=/000free.us/::  000free.us

Or if need all matched values use Series.str.findall with Series.str.join :

df['website'] = df['domainname'].str.findall('/(.*?)/').str.join(', ')

If need ony second value after spliting by / use Series.str.split with indexing:

df['website'] = df['domainname'].str.split('/').str[1]
def f(y):    
  return [ x[1] for x in y ]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM