删除python 3.7中的特殊字符

Question

I've been testing for rip the url using python and I get the result from str我一直在使用 python 测试 rip url，我从 str 得到结果

itdUrlforrip.text content: http://itdmusic.in/category/new-releases/page/4 itdUrlforrip.text 内容： http ://itdmusic.in/category/new-releases/page/4

the complete code完整的代码

#!/usr/bin/python
import requests
import re
import regex
from pyquery import PyQuery

#get each
link1 = open('/Users/R/Downloads/itdUrlforrip.txt','r').read()
list1 = link1.split('\n')
list2 = []
for eachlink1 in list1:
    linkSub1 = requests.get(eachlink1).text
    splitContent = linkSub1.split("Facebook")
    splitContent1 = splitContent[0]
    list2.append(splitContent1)

list2GLStr = ("\n".join(list2))
urlAll = regex.findall('itdmusic\.in\/\d\d\/.+\.html', list2GLStr)
allUrlrmDup1 = list(dict.fromkeys(urlAll))

#get list of url from input
allUrlrmDup1Ah = regex.sub('itdmusic', 'http://itdmusic', str(allUrlrmDup1))
allUrlrmDup1Ah2 = regex.sub('\'', '', str(allUrlrmDup1Ah))
allUrlrmDup1Ah3 = regex.sub('\[', '', str(allUrlrmDup1Ah2))
allUrlrmDup1Ah4 = regex.sub('\]', '', str(allUrlrmDup1Ah3))
allUrlrmDup1AhGL = ("\n".join(list(allUrlrmDup1Ah4.split(', '))))
allUrlrmDup1AhList = allUrlrmDup1AhGL.split('\n')

list3 = []
list4 = []
for eachlink2 in allUrlrmDup1AhList:
    linkSub2 = requests.get(eachlink2).text
    urlGdr = regex.findall('drive\.google\.com\/.{41}', linkSub2)
    urlOth = regex.findall('https\:\/\/www\d\d\d\.zippyshare\.com\/v.{19}|https\:\/\/www\d\d\.zippyshare\.com\/v.{19}|https\:\/\/www\d\.zippyshare\.com\/v.{19}|https?:\/\/douploads\.com\/.{12}|https?:\/\/www\.mirrored\.to\/.{14}|https?:\/\/mir\.cr\/.{8}|https?:\/\/hexupload\.net\/.{12}|https?:\/\/intoupload\.net\/.{12}|https?:\/\/www\.dropbox\.com\/s\/.{15}|https?:\/\/dbree\.org\/v\/.{6}|https?:\/\/dropapk\.to\/.{12}|https?:\/\/www\.sendspace\.com\/file\/.{6}|https?:\/\/gestyy\.com\/.{6}|https?:\/\/ouo\.io\/\w{6}|https?:\/\/mega\.nz.{55}|https?:\/\/bit\.ly.{8}', linkSub2)
    urlska = regex.findall('https?\:\/\/itdmusic\.in\/skipads\/.+\/\'', linkSub2)
    urlskaStr = str(urlska)
    urlska2 = regex.sub('\/\'', '', urlskaStr)
    list3.append(urlGdr)
    list3.append(urlOth)
    list4.append(urlska2)

then I然后我

print(list4)

and the result is结果是

'[]', '[]', '[]', '[]', '[]', '[]', '[]', '[]', '["http://itdmusic.in/skipads/2020/03/12/luke-bryan-one-margarita-pre-single"]', '["http://itdmusic.in/skipads/2020/03/12/kota-banks-italiana-single"]'

for 32s 32s

so is there a way to get rid of '[]' and just get the url in here?那么有没有办法摆脱'[]'并在这里获取网址？ I try bunch of things and still cannot figure out using regex and re.我尝试了很多东西，但仍然无法弄清楚使用正则表达式和重新。 I'm little bit confusing by using for xxx in xxx.我在 xxx 中使用 for xxx 有点困惑。

Answer 1

The thing is regex.findall() returns a list and you are appending it to another list, thus you are getting the '[]'.事情是 regex.findall() 返回一个列表，您将它附加到另一个列表，因此您得到了“[]”。

You should use "list4.extend(urlska2)" instead of "list4.append(urlska2)"您应该使用“list4.extend(urlska2)”而不是“list4.append(urlska2)”

which would give you what you want.这会给你你想要的。

删除python 3.7中的特殊字符

问题描述

1 个解决方案

解决方案1
0 2020-03-13 05:17:56

删除python 3.7中的特殊字符

问题描述

1 个解决方案

解决方案1 0 2020-03-13 05:17:56

解决方案1
0 2020-03-13 05:17:56