[英]How to remove a specific part of a link?
所以基本上我制作了一个脚本,能够从 TrackmaniaExchange 下载一堆带有搜索结果的地图。 但是,要下载 map 文件,我需要实际的下载链接,搜索结果没有给出。
我已经知道如何下载地图了。 链接是 https://trackmania.exchange/maps/download/(map id)。 但是,搜索结果的 href 是 /maps/(map id)/(map name)。
我想做的是使用 selenium 到 go 访问该站点,获取 map 的 href,使用 re.sub 编辑链接,以便它链接到 /maps/download/(map id)/,并删除结尾与 re.sub 的链接,因此其末尾没有 map 名称。 不过,我不知道如何 go。 到目前为止,这是我的脚本中的内容:
import requests
import os.path
import os
import selenium.webdriver as webdriver
from selenium.webdriver.firefox.options import Options
import time
import re
def Search():
link="https://trackmania.exchange/mapsearch2?limit=100" #Trackmania Exchange link, will scrape all 100 results
checkedlink = re.sub("\s", "+", link) #Replaces spaces with + for track names (this shouldnt happen with authors/tags)
options = Options() #This is for selenium
options.binary_location = "C:/Program Files/Mozilla Firefox/firefox.exe"
driver = webdriver.Firefox(options=options)
search_box = driver.find_element_by_name("trackname")
sitelinks = driver.find_element_by_xpath("/html/[div/@id='container'/@data-select2-id='container']/[div/@class='container-inner']/[div/@class='ly-box-open']/[div/@class='box-col-6']/[div/@class='windowv2-panel']/[div/@id='searchResults-container']/div/div/table/tbody/[tr/@class='WindowTableCell2v2 with-hover has-image']/[td/@class='cell-ellipsis']")
results = []
name=input("Track Name (if nothing, hit enter)") #Prompts the user to input stuff
author=input("Track Author (if nothing, hit enter)")
tags=input("Tags (separate with %2C if there's multiple, if nothing, hit enter)")
path=input("Map download directory (do not leave blank, use forward slashes)")
print("WARNING: Download wget for this script to work.")
type(name) #These are to make a link to find html with
type(author)
type(tags)
type(path)
if path == "":
print("Please put a path next time you start this")
time.sleep(3)
os.exit()
else: #And so begins the if/else hellhole to find out what needs to be added to the link
if tags == "":
if name == "":
if author == "":
print("Chief, you cant just enter nothing. Put something in here next time")
time.sleep(3)
os.exit()
else:
link = link+"&author="+author
else:
link = link+"&trackname="+name
if author != "":
link = link+"&author="+author
else:
link = link+"&tags="+tags
if name != "":
link = link+"&trackname="+name
if author != "":
link = link+"&author="+author
else:
if author != "":
link = link+"&author="+author
print("Checking link...")
checkedlink() #this is to make sure there's no spaces in the link. tags are separated by %2C, but track names are separated by +
print("Attempting to download...")
driver.get(link)
links = sitelinks
for link in links
href = link.get_attribute("href")
browser.close()
with open("list.txt", "w", encoding="utf-8") as f:
f.write(href)
for line in f:
h = re.findall("\d") #My failed attempt at removing the end of the link
re.sub("/maps/", "https://trackmania.exchange/maps/download", f)
re.sub("") #unfinished part cause i was stubbed
os.system("wget --directory-prefix="path" -i list.txt")
Search()
他们的 API 列在网站上,在查看网站规则后,这是允许的。 在制作 if/else hellhole 之后,我还没有真正测试脚本,但我可以稍后再处理。 我需要帮助的是删除 map ID 后面的 map 名称。 如果您需要一个合适的示例,对我来说首页上的 href 之一是 /maps/91677/cloudy-day。 每个链接都会不同,所以我真的不知道我应该做什么。
如果我知道 URL 格式将是/maps/id/some-text
并且 ID 将仅包含数字,那么我只需使用波纹管正则表达式从链接中获取 ID,然后使用 f 字符串构建 URL .
map_id = re.search(r"\d+", url).group(0)
get_map_url = f"https://trackmania.exchange/maps/download/{map_id}"
在regex101上尝试使用您可能会遇到的不同 URL。
问题未解决?试试以下方法:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.