简体   繁体   English

在原始代码 python,selenium 中替换 url 时出现错误,重新

[英]I am getting error while replacing url in orginal code python,selenium,re

i used code in this question https://codereview.stackexchange.com/questions/241842/webscraping-with-selenium-a-course-downloader-and-sorter/248712#248712我在这个问题中使用了代码https://codereview.stackexchange.com/questions/241842/webscraping-with-selenium-a-course-downloader-and-sorter/248712#248712

i replaced url in that code to my url when i compile get an error shown below我将该代码中的 url 替换为我的 url,当我编译时出现如下所示的错误

line 66, in current_file_name = re.search(r'https://player.hdflixcore.workers.dev//0://Courses//Account%20Cracking%20--MrSihag//TN%20Cracking%20Course%20--MrSihag/.+/(.+)', download_path, re.DOTALL).group(1) AttributeError: 'NoneType' object has no attribute 'group'第66行,在current_file_name = re.search(r'https://player.hdflixcore.workers.dev//0://Courses//Account%20Cracking%20--MrSihag//TN%20Cracking%20Course%20- -MrSihag/.+/(.+)', download_path, re.DOTALL).group(1) AttributeError: 'NoneType' object 没有属性 'group'

i figured i that in code i used websiteaddress in "current_file_name" has some extra letters like backward slash我想我在代码中使用了“current_file_name”中的网站地址有一些额外的字母,如反斜杠
i have no idea about it i tried to do like same by adding some backward slash but no fix我对此一无所知 我试图通过添加一些反斜杠来做同样的事情但没有修复

but when i run orginal code it works fine when i use it in my desired site it end up with error that mentioned above但是当我运行原始代码时它工作正常当我在我想要的站点中使用它时它最终会出现上述错误

below is my edited code下面是我编辑的代码


from selenium import webdriver
import time
import os
import shutil
import re

path = r'https://player.hdflixcore.workers.dev/0:/Courses/Account%20Cracking%20--MrSihag/TN%20Cracking%20Course%20--MrSihag/'

# For changing the download location for this browser temporarily
options = webdriver.ChromeOptions()
preferences = {"download.default_directory": r"C:\Users\shanid\Desktop\test", "safebrowsing.enabled": "false"}
options.add_experimental_option("prefs", preferences)

# Acquire the Course Link and Get all the directories
browser = webdriver.Chrome(chrome_options=options)
browser.get(r"https://player.hdflixcore.workers.dev/0:/Courses/Account%20Cracking%20--MrSihag/TN%20Cracking%20Course%20--MrSihag/")
time.sleep(2)
elements = browser.find_elements_by_css_selector(".mdui-text-truncate")

# loop for as many directories there are
for i in range(0, len(elements)):
    print("deft")

    # At each directory, it refreshes the page to update the webelements in the list, and returns the current directory that is being worked on
    browser.get(path)
    time.sleep(2)
    elements = browser.find_elements_by_css_selector(".mdui-text-truncate")
    element = elements[i]

    # checks if the folder for the directory already exists
    current_directory_name = element.text[11:].strip(" .")
    current_folder_path = "C:\\Users\\shanid\\Desktop\\test\\" + current_directory_name
    if os.path.exists(current_folder_path):
        pass
    else:
        os.mkdir(current_folder_path)

    # Formatting what has been downloaded and sorted, and
    print(current_directory_name, "------------------------------", sep="\n")

    # moves on to the directory to get the page with the files
    element.click()

    # pausing for a few secs for the page to load, and running the same mechanism to get each file using the same method used in directory
    time.sleep(3)
    files = browser.find_elements_by_css_selector(".mdui-text-truncate")
    for j in range(len(files)):
        files = browser.find_elements_by_css_selector(".mdui-text-truncate")
        _file = files[j]
    # constants for some if statements
        download = True
        move = True
        current_file_name = _file.text[17:].strip()

    # If file exists, then pass over it, and don't do anything, and moveon to next file
        if os.path.exists(current_folder_path + "\\" + current_file_name):
            pass

    # If it doesnt exist, then depending on its extension, do specific actions with it
        else:
            # Downloads the mp4 files by clicking on it, and finding the input tag which contains the download link for vid in its value attribute
            if ".mp4" in current_file_name:
                _file.click()
                time.sleep(2)
                download_path = browser.find_element_by_css_selector("input").get_attribute("value")
                current_file_name = re.search(r'https://player.hdflixcore.workers.dev//0://Courses//Account%20Cracking%20--MrSihag//TN%20Cracking%20Course%20--MrSihag/.+/(.+)', download_path, re.DOTALL).group(1)
                # Checks if file exists again, incase the filename is different then the predicted filename orderly generated.
                if os.path.exists(current_folder_path + "\\" + current_file_name):
                    move = False
                    download = False
                # returns to the previous page with the files
                browser.back()

            # self explanatory
            elif ".html" in current_file_name:
                download_path = path + current_directory_name + "/" + current_file_name
                if os.path.exists(current_folder_path + "\\" + current_file_name):
                    move = False
                    download = False

            else:
            # acquires the download location by going to the parent tag which is an a tag containing the link for html in its 'href' attribute
                download_path = _file.find_element_by_xpath('..').get_attribute('href').replace(r"%5E", "^")
                current_file_name = re.search(r'https://player.hdflixcore.workers.dev/0:/Courses/Account%20Cracking%20--MrSihag/TN%20Cracking%20Course%20--MrSihag/.+/(.+)', download_path, re.DOTALL).group(1).replace("%20", " ")

            time.sleep(2)
            current_file_path = "C:\\Users\\shanid\\Desktop\\test\\" + current_file_name
            # responsible for downloading it using a path, get allows downloading, by source links
            if download:
                browser.get(download_path)

                # while the file doesn't exist/ it hasn't been downloaded yet, do nothing
                while True:
                    if os.path.exists(current_file_path):
                        break
                time.sleep(1)

            # moves the file from the download spot to its own folder
            if move:
                shutil.move(current_file_path, current_folder_path + "\\" + current_file_name)
        print(current_file_name)

    # formatter
    print("------------------------------", "", sep="\n")
    time.sleep(3) 

orginal code below原代码如下


from selenium import webdriver
import time
import os
import shutil
import re

path = r'https://coursevania.courses.workers.dev/[coursevania.com]%20Udemy%20-%20Master%20the%20Coding%20Interview%20Data%20Structures%20+%20Algorithms/'

# For changing the download location for this browser temporarily
options = webdriver.ChromeOptions()
preferences = {"download.default_directory": r"E:\Utilities_and_Apps\Python\MY PROJECTS\Test data\Downloads", "safebrowsing.enabled": "false"}
options.add_experimental_option("prefs", preferences)

# Acquire the Course Link and Get all the directories
browser = webdriver.Chrome(chrome_options=options)
browser.get(r"https://coursevania.courses.workers.dev/[coursevania.com]%20Udemy%20-%20Master%20the%20Coding%20Interview%20Data%20Structures%20+%20Algorithms/")
time.sleep(2)
elements = browser.find_elements_by_css_selector(".mdui-text-truncate")

# loop for as many directories there are
for i in range(0, len(elements)):

    # At each directory, it refreshes the page to update the webelements in the list, and returns the current directory that is being worked on
    browser.get(path)
    time.sleep(2)
    elements = browser.find_elements_by_css_selector(".mdui-text-truncate")
    element = elements[i]

    # checks if the folder for the directory already exists
    current_directory_name = element.text[11:].strip(" .")
    current_folder_path = "E:\\Utilities_and_Apps\\Python\\MY PROJECTS\\Test data\Downloads\\" + current_directory_name
    if os.path.exists(current_folder_path):
        pass
    else:
        os.mkdir(current_folder_path)

    # Formatting what has been downloaded and sorted, and 
    print(current_directory_name, "------------------------------", sep="\n")

    # moves on to the directory to get the page with the files
    element.click()

    # pausing for a few secs for the page to load, and running the same mechanism to get each file using the same method used in directory 
    time.sleep(3)
    files = browser.find_elements_by_css_selector(".mdui-text-truncate")
    for j in range(len(files)):
        files = browser.find_elements_by_css_selector(".mdui-text-truncate")
        _file = files[j]
    # constants for some if statements
        download = True
        move = True
        current_file_name = _file.text[17:].strip()

    # If file exists, then pass over it, and don't do anything, and moveon to next file
        if os.path.exists(current_folder_path + "\\" + current_file_name):
            pass

    # If it doesnt exist, then depending on its extension, do specific actions with it 
        else:
            # Downloads the mp4 files by clicking on it, and finding the input tag which contains the download link for vid in its value attribute
            if ".mp4" in current_file_name:
                _file.click()
                time.sleep(2)  
                download_path = browser.find_element_by_css_selector("input").get_attribute("value")
                current_file_name = re.search(r'https://coursevania.courses.workers.dev/\[coursevania.com\]%20Udemy%20-%20Master%20the%20Coding%20Interview%20Data%20Structures%20\+%20Algorithms/.+/(.+)', download_path, re.DOTALL).group(1)
                # Checks if file exists again, incase the filename is different then the predicted filename orderly generated.
                if os.path.exists(current_folder_path + "\\" + current_file_name):
                    move = False
                    download = False
                # returns to the previous page with the files
                browser.back()

            # self explanatory
            elif ".html" in current_file_name:
                download_path = path + current_directory_name + "/" + current_file_name
                if os.path.exists(current_folder_path + "\\" + current_file_name):
                    move = False
                    download = False

            else:
            # acquires the download location by going to the parent tag which is an a tag containing the link for html in its 'href' attribute
                download_path = _file.find_element_by_xpath('..').get_attribute('href').replace(r"%5E", "^")
                current_file_name = re.search(r'https://coursevania.courses.workers.dev/\[coursevania.com\]%20Udemy%20-%20Master%20the%20Coding%20Interview%20Data%20Structures%20\+%20Algorithms/.+/(.+)', download_path, re.DOTALL).group(1).replace("%20", " ")

            time.sleep(2)
            current_file_path = "E:\\Utilities_and_Apps\\Python\\MY PROJECTS\\Test data\Downloads\\" + current_file_name
            # responsible for downloading it using a path, get allows downloading, by source links
            if download:
                browser.get(download_path)

                # while the file doesn't exist/ it hasn't been downloaded yet, do nothing
                while True:
                    if os.path.exists(current_file_path):
                        break
                time.sleep(1)

            # moves the file from the download spot to its own folder
            if move:
                shutil.move(current_file_path, current_folder_path + "\\" + current_file_name)
        print(current_file_name)

    # formatter
    print("------------------------------", "", sep="\n")
    time.sleep(3)

this code works fine此代码工作正常

but not working when i change the website to https://player.hdflixcore.workers.dev/0:/Courses/Account%20Cracking%20--MrSihag/TN%20Cracking%20Course%20--MrSihag/但当我将网站更改为https://player.hdflixcore.workers.dev/0:/Courses/Account%20Cracking%20--MrSihag/TN%20Cracking%20Course%20--MrSihag/时无法正常工作

the site i used is clone of orginal site我使用的网站是原始网站的克隆

i have no idea why getting error我不知道为什么会出错

The issue is with the CSS selector for input box on below page.问题出在下页输入框的 CSS 选择器上。 https://player.hdflixcore.workers.dev/0:/Courses/Account%20Cracking%20--MrSihag/TN%20Cracking%20Course%20--MrSihag/01%20Course%20Introduction/1%20Course%20Introduction.mp4?a=view https://player.hdflixcore.workers.dev/0:/Courses/Account%20Cracking%20--MrSihag/TN%20Cracking%20Course%20--MrSihag/01%20Course%20Introduction/1%20Course%20Introduction.mp4 ?a=视图

There are 2 inputs boxes on the page, so you have to write CSS path as "#content > div > div:nth-child(6) > input" .页面上有 2 个输入框,因此您必须将 CSS 路径写为"#content > div > div:nth-child(6) > input"

Code with issue.有问题的代码。

download_path = browser.find_element_by_css_selector("input").get_attribute("value")

To be replaced with.被替换为。

download_path = browser.find_element_by_css_selector("#content > div > div:nth-child(6) > input").get_attribute("value")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 python 代码中执行更新查询时出现此错误 - I am getting this error while executing update query in python code 我在 selenium python 上收到此错误 - I am getting this error on selenium python selenium python 为什么我没有得到图像 url? - selenium python why I am not getting image url? 这是我在使用 selenium 打开网站时遇到的错误 - This is an error i am getting while opening a website using selenium 我在python代码中遇到错误 - I am getting an error in python code 在 python 中调用 CosmosClient(url,key,proxy_config = d) 时出现错误? - While calling CosmosClient(url,key,proxy_config = d) in python i am getting error? 运行此 python 代码时出现错误。 错误是“ElementNotInteractableException”。 任何人都可以帮助我吗? - I am getting error while running this python code. The error is “ElementNotInteractableException”. Can any on help me out? Python Selenium:我正在获取元素未附加到页面文档错误 - Python Selenium: I am getting element is not attached to the page document error 当我使用 python selenium headless 时出现 tImeout 错误 - Getting tImeout error when I am using python selenium headless 我在通过 selenium python 抓取内容时遇到错误 - I am getting error in scraping content through selenium python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM