繁体   English   中英

如何删除链接的特定部分?

[英]How to remove a specific part of a link?

所以基本上我制作了一个脚本,能够从 TrackmaniaExchange 下载一堆带有搜索结果的地图。 但是,要下载 map 文件,我需要实际的下载链接,搜索结果没有给出。

我已经知道如何下载地图了。 链接是 https://trackmania.exchange/maps/download/(map id)。 但是,搜索结果的 href 是 /maps/(map id)/(map name)。

我想做的是使用 selenium 到 go 访问该站点,获取 map 的 href,使用 re.sub 编辑链接,以便它链接到 /maps/download/(map id)/,并删除结尾与 re.sub 的链接,因此其末尾没有 map 名称。 不过,我不知道如何 go。 到目前为止,这是我的脚本中的内容:

import requests
import os.path
import os
import selenium.webdriver as webdriver
from selenium.webdriver.firefox.options import Options
import time
import re

def Search():
    link="https://trackmania.exchange/mapsearch2?limit=100" #Trackmania Exchange link, will scrape all 100 results
    checkedlink = re.sub("\s", "+", link) #Replaces spaces with + for track names (this shouldnt happen with authors/tags)
    options = Options() #This is for selenium
    options.binary_location = "C:/Program Files/Mozilla Firefox/firefox.exe"
    driver = webdriver.Firefox(options=options)
    search_box = driver.find_element_by_name("trackname")
    sitelinks = driver.find_element_by_xpath("/html/[div/@id='container'/@data-select2-id='container']/[div/@class='container-inner']/[div/@class='ly-box-open']/[div/@class='box-col-6']/[div/@class='windowv2-panel']/[div/@id='searchResults-container']/div/div/table/tbody/[tr/@class='WindowTableCell2v2 with-hover has-image']/[td/@class='cell-ellipsis']")
    results = []
    name=input("Track Name (if nothing, hit enter)") #Prompts the user to input stuff
    author=input("Track Author (if nothing, hit enter)")
    tags=input("Tags (separate with %2C if there's multiple, if nothing, hit enter)")
    path=input("Map download directory (do not leave blank, use forward slashes)")
    print("WARNING: Download wget for this script to work.")
    type(name) #These are to make a link to find html with
    type(author)
    type(tags)
    type(path)
    if path == "":
        print("Please put a path next time you start this")
        time.sleep(3)
        os.exit()
    else: #And so begins the if/else hellhole to find out what needs to be added to the link
        if tags == "":
            if name == "":
                if author == "":
                    print("Chief, you cant just enter nothing.  Put something in here next time")
                    time.sleep(3)
                    os.exit()
                else:
                    link = link+"&author="+author
            else:
                link = link+"&trackname="+name
                if author != "":
                    link = link+"&author="+author
        else:
            link = link+"&tags="+tags
            if name != "":
                link = link+"&trackname="+name
                if author != "":
                    link = link+"&author="+author
            else:
                if author != "":
                    link = link+"&author="+author
    print("Checking link...")
    checkedlink() #this is to make sure there's no spaces in the link.  tags are separated by %2C, but track names are separated by +
    print("Attempting to download...")
    driver.get(link)
    links = sitelinks
    for link in links
        href = link.get_attribute("href")
        browser.close()
        with open("list.txt", "w", encoding="utf-8") as f:
            f.write(href)
            for line in f:
                h = re.findall("\d") #My failed attempt at removing the end of the link
                re.sub("/maps/", "https://trackmania.exchange/maps/download", f)
                re.sub("") #unfinished part cause i was stubbed
    os.system("wget --directory-prefix="path" -i list.txt")

Search()

他们的 API 列在网站上,在查看网站规则后,这是允许的。 在制作 if/else hellhole 之后,我还没有真正测试脚本,但我可以稍后再处理。 我需要帮助的是删除 map ID 后面的 map 名称。 如果您需要一个合适的示例,对我来说首页上的 href 之一是 /maps/91677/cloudy-day。 每个链接都会不同,所以我真的不知道我应该做什么。

如果我知道 URL 格式将是/maps/id/some-text并且 ID 将仅包含数字,那么我只需使用波纹管正则表达式从链接中获取 ID,然后使用 f 字符串构建 URL .

map_id = re.search(r"\d+", url).group(0)
get_map_url = f"https://trackmania.exchange/maps/download/{map_id}"

regex101上尝试使用您可能会遇到的不同 URL。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM