[英]Removal of items from list isn't working
I'm working on a pretty cool project but I need help. 我正在做一个很酷的项目,但我需要帮助。 You see im collecting proxies from sslproxies.org, but sorting these proxies collected from the table into a list without extra info is pretty hard.
您可以从sslproxies.org看到即时消息收集代理,但是将从表中收集的这些代理排序到没有额外信息的列表中非常困难。 So far my code isnt working.
到目前为止,我的代码无法正常工作。 Hope u guys can help.What I want to do is delete the sixth item in a the list after every two.
希望你们能提供帮助。我想做的是每隔两个删除列表中的第六个项目。
f = open("proxies.txt", 'w+')
def getProxy():
url = "https://www.sslproxies.org"
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, "html.parser")
global tlist
tlist = []
for tr in soup.find_all('tr'):
for td in tr.find_all('td'):
tlist.append(td)
clist = tlist
count = 0
for word in clist:
count += 1
if count > 2:
clist.remove(word)
count += 1
if count >= 6:
count = 0
else:
continue
f.write(str(clist))
Here is a generator that yields two items, then skips six, then yields two more, etc 这是一个生成器,生成两个项,然后跳过六个,然后再生成两个,依此类推
def skip_six(l):
for i, x in enumerate(l):
if i%8 <= 1:
yield x
You can use this to make a list like 您可以使用它来制作类似
clist = list(skip_six(tlist))
I believe you want to select first 2 columns. 我相信您想选择前2列。 In this case you may want to try something like this with pandas read html .
在这种情况下,您可能要尝试使用读取html的pandas进行类似的操作。 Just note that I can not access the website you mentioned.
请注意,我无法访问您提到的网站。 So i haven't tested this code
所以我还没有测试这段代码
import pandas as pd
df=pd.read_html(io ='https://www.sslproxies.org')
print df
print df[['IP Address','Port']] # select the columns that you are interested in
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.