Here's my datasets
domainname
0 address=/000007.ru/0.0.0.0
1 address=/000007.ru/::
2 address=/000free.us/0.0.0.0
3 address=/000free.us/::
I want to extract word between /
and /
so the desired output is
domainname website
0 address=/000007.ru/0.0.0.0 000007.ru
1 address=/000007.ru/:: 000007.ru
2 address=/000free.us/0.0.0.0 000free.us
3 address=/000free.us/:: 000free.us
Here's what I try
adsdata_vector = df["domainname"]
ads = []
for i in range(len(adsdata)):
ads.append(re.split(r"[/]+",adsdata_vector[i]))
ads[0:4]
Here's what I get
[['address=', '000007.ru', '0.0.0.0'],
['address=', '000007.ru', '::'],
['address=', '000free.us', '0.0.0.0'],
['address=', '000free.us', '::']]
I only want second column only, please suggest something?
如果地址始终是address=/000007.ru/0.0.0.0
并且您每次都想提取第二列,为什么不使用:
website = address.split('/')[1]
You can use Series.str.extract :
df['website'] = df.domainname.str.extract(r'/(.+)/')
domainname website
0 address=/000007.ru/0.0.0.0 000007.ru
1 address=/000007.ru/:: 000007.ru
2 address=/000free.us/0.0.0.0 000free.us
3 address=/000free.us/:: 000free.us
The regex r'/(.+)/'
will find any character repeated one or more times between two /
If want extract first matched values use Series.str.extract
:
df['website'] = df['domainname'].str.extract('/(.*?)/')
print (df)
domainname website
0 address=/000007.ru/0.0.0.0 000007.ru
1 address=/000007.ru/:: 000007.ru
2 address=/000free.us/0.0.0.0 000free.us
3 address=/000free.us/:: 000free.us
Or if need all matched values use Series.str.findall
with Series.str.join
:
df['website'] = df['domainname'].str.findall('/(.*?)/').str.join(', ')
If need ony second value after spliting by /
use Series.str.split
with indexing:
df['website'] = df['domainname'].str.split('/').str[1]
def f(y):
return [ x[1] for x in y ]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.