在原始代码 python,selenium 中替换 url 时出现错误，重新

Question

i used code in this question https://codereview.stackexchange.com/questions/241842/webscraping-with-selenium-a-course-downloader-and-sorter/248712#248712我在这个问题中使用了代码https://codereview.stackexchange.com/questions/241842/webscraping-with-selenium-a-course-downloader-and-sorter/248712#248712

i replaced url in that code to my url when i compile get an error shown below我将该代码中的 url 替换为我的 url，当我编译时出现如下所示的错误

line 66, in current_file_name = re.search(r'https://player.hdflixcore.workers.dev//0://Courses//Account%20Cracking%20--MrSihag//TN%20Cracking%20Course%20--MrSihag/.+/(.+)', download_path, re.DOTALL).group(1) AttributeError: 'NoneType' object has no attribute 'group'第66行，在current_file_name = re.search(r'https://player.hdflixcore.workers.dev//0://Courses//Account%20Cracking%20--MrSihag//TN%20Cracking%20Course%20- -MrSihag/.+/(.+)', download_path, re.DOTALL).group(1) AttributeError: 'NoneType' object 没有属性 'group'

i figured i that in code i used websiteaddress in "current_file_name" has some extra letters like backward slash我想我在代码中使用了“current_file_name”中的网站地址有一些额外的字母，如反斜杠
i have no idea about it i tried to do like same by adding some backward slash but no fix我对此一无所知我试图通过添加一些反斜杠来做同样的事情但没有修复

but when i run orginal code it works fine when i use it in my desired site it end up with error that mentioned above但是当我运行原始代码时它工作正常当我在我想要的站点中使用它时它最终会出现上述错误

below is my edited code下面是我编辑的代码


from selenium import webdriver
import time
import os
import shutil
import re

path = r'https://player.hdflixcore.workers.dev/0:/Courses/Account%20Cracking%20--MrSihag/TN%20Cracking%20Course%20--MrSihag/'

# For changing the download location for this browser temporarily
options = webdriver.ChromeOptions()
preferences = {"download.default_directory": r"C:\Users\shanid\Desktop\test", "safebrowsing.enabled": "false"}
options.add_experimental_option("prefs", preferences)

# Acquire the Course Link and Get all the directories
browser = webdriver.Chrome(chrome_options=options)
browser.get(r"https://player.hdflixcore.workers.dev/0:/Courses/Account%20Cracking%20--MrSihag/TN%20Cracking%20Course%20--MrSihag/")
time.sleep(2)
elements = browser.find_elements_by_css_selector(".mdui-text-truncate")

# loop for as many directories there are
for i in range(0, len(elements)):
    print("deft")

    # At each directory, it refreshes the page to update the webelements in the list, and returns the current directory that is being worked on
    browser.get(path)
    time.sleep(2)
    elements = browser.find_elements_by_css_selector(".mdui-text-truncate")
    element = elements[i]

    # checks if the folder for the directory already exists
    current_directory_name = element.text[11:].strip(" .")
    current_folder_path = "C:\\Users\\shanid\\Desktop\\test\\" + current_directory_name
    if os.path.exists(current_folder_path):
        pass
    else:
        os.mkdir(current_folder_path)

    # Formatting what has been downloaded and sorted, and
    print(current_directory_name, "------------------------------", sep="\n")

    # moves on to the directory to get the page with the files
    element.click()

    # pausing for a few secs for the page to load, and running the same mechanism to get each file using the same method used in directory
    time.sleep(3)
    files = browser.find_elements_by_css_selector(".mdui-text-truncate")
    for j in range(len(files)):
        files = browser.find_elements_by_css_selector(".mdui-text-truncate")
        _file = files[j]
    # constants for some if statements
        download = True
        move = True
        current_file_name = _file.text[17:].strip()

    # If file exists, then pass over it, and don't do anything, and moveon to next file
        if os.path.exists(current_folder_path + "\\" + current_file_name):
            pass

    # If it doesnt exist, then depending on its extension, do specific actions with it
        else:
            # Downloads the mp4 files by clicking on it, and finding the input tag which contains the download link for vid in its value attribute
            if ".mp4" in current_file_name:
                _file.click()
                time.sleep(2)
                download_path = browser.find_element_by_css_selector("input").get_attribute("value")
                current_file_name = re.search(r'https://player.hdflixcore.workers.dev//0://Courses//Account%20Cracking%20--MrSihag//TN%20Cracking%20Course%20--MrSihag/.+/(.+)', download_path, re.DOTALL).group(1)
                # Checks if file exists again, incase the filename is different then the predicted filename orderly generated.
                if os.path.exists(current_folder_path + "\\" + current_file_name):
                    move = False
                    download = False
                # returns to the previous page with the files
                browser.back()

            # self explanatory
            elif ".html" in current_file_name:
                download_path = path + current_directory_name + "/" + current_file_name
                if os.path.exists(current_folder_path + "\\" + current_file_name):
                    move = False
                    download = False

            else:
            # acquires the download location by going to the parent tag which is an a tag containing the link for html in its 'href' attribute
                download_path = _file.find_element_by_xpath('..').get_attribute('href').replace(r"%5E", "^")
                current_file_name = re.search(r'https://player.hdflixcore.workers.dev/0:/Courses/Account%20Cracking%20--MrSihag/TN%20Cracking%20Course%20--MrSihag/.+/(.+)', download_path, re.DOTALL).group(1).replace("%20", " ")

            time.sleep(2)
            current_file_path = "C:\\Users\\shanid\\Desktop\\test\\" + current_file_name
            # responsible for downloading it using a path, get allows downloading, by source links
            if download:
                browser.get(download_path)

                # while the file doesn't exist/ it hasn't been downloaded yet, do nothing
                while True:
                    if os.path.exists(current_file_path):
                        break
                time.sleep(1)

            # moves the file from the download spot to its own folder
            if move:
                shutil.move(current_file_path, current_folder_path + "\\" + current_file_name)
        print(current_file_name)

    # formatter
    print("------------------------------", "", sep="\n")
    time.sleep(3)

orginal code below原代码如下


from selenium import webdriver
import time
import os
import shutil
import re

path = r'https://coursevania.courses.workers.dev/[coursevania.com]%20Udemy%20-%20Master%20the%20Coding%20Interview%20Data%20Structures%20+%20Algorithms/'

# For changing the download location for this browser temporarily
options = webdriver.ChromeOptions()
preferences = {"download.default_directory": r"E:\Utilities_and_Apps\Python\MY PROJECTS\Test data\Downloads", "safebrowsing.enabled": "false"}
options.add_experimental_option("prefs", preferences)

# Acquire the Course Link and Get all the directories
browser = webdriver.Chrome(chrome_options=options)
browser.get(r"https://coursevania.courses.workers.dev/[coursevania.com]%20Udemy%20-%20Master%20the%20Coding%20Interview%20Data%20Structures%20+%20Algorithms/")
time.sleep(2)
elements = browser.find_elements_by_css_selector(".mdui-text-truncate")

# loop for as many directories there are
for i in range(0, len(elements)):

    # At each directory, it refreshes the page to update the webelements in the list, and returns the current directory that is being worked on
    browser.get(path)
    time.sleep(2)
    elements = browser.find_elements_by_css_selector(".mdui-text-truncate")
    element = elements[i]

    # checks if the folder for the directory already exists
    current_directory_name = element.text[11:].strip(" .")
    current_folder_path = "E:\\Utilities_and_Apps\\Python\\MY PROJECTS\\Test data\Downloads\\" + current_directory_name
    if os.path.exists(current_folder_path):
        pass
    else:
        os.mkdir(current_folder_path)

    # Formatting what has been downloaded and sorted, and 
    print(current_directory_name, "------------------------------", sep="\n")

    # moves on to the directory to get the page with the files
    element.click()

    # pausing for a few secs for the page to load, and running the same mechanism to get each file using the same method used in directory 
    time.sleep(3)
    files = browser.find_elements_by_css_selector(".mdui-text-truncate")
    for j in range(len(files)):
        files = browser.find_elements_by_css_selector(".mdui-text-truncate")
        _file = files[j]
    # constants for some if statements
        download = True
        move = True
        current_file_name = _file.text[17:].strip()

    # If file exists, then pass over it, and don't do anything, and moveon to next file
        if os.path.exists(current_folder_path + "\\" + current_file_name):
            pass

    # If it doesnt exist, then depending on its extension, do specific actions with it 
        else:
            # Downloads the mp4 files by clicking on it, and finding the input tag which contains the download link for vid in its value attribute
            if ".mp4" in current_file_name:
                _file.click()
                time.sleep(2)  
                download_path = browser.find_element_by_css_selector("input").get_attribute("value")
                current_file_name = re.search(r'https://coursevania.courses.workers.dev/\[coursevania.com\]%20Udemy%20-%20Master%20the%20Coding%20Interview%20Data%20Structures%20\+%20Algorithms/.+/(.+)', download_path, re.DOTALL).group(1)
                # Checks if file exists again, incase the filename is different then the predicted filename orderly generated.
                if os.path.exists(current_folder_path + "\\" + current_file_name):
                    move = False
                    download = False
                # returns to the previous page with the files
                browser.back()

            # self explanatory
            elif ".html" in current_file_name:
                download_path = path + current_directory_name + "/" + current_file_name
                if os.path.exists(current_folder_path + "\\" + current_file_name):
                    move = False
                    download = False

            else:
            # acquires the download location by going to the parent tag which is an a tag containing the link for html in its 'href' attribute
                download_path = _file.find_element_by_xpath('..').get_attribute('href').replace(r"%5E", "^")
                current_file_name = re.search(r'https://coursevania.courses.workers.dev/\[coursevania.com\]%20Udemy%20-%20Master%20the%20Coding%20Interview%20Data%20Structures%20\+%20Algorithms/.+/(.+)', download_path, re.DOTALL).group(1).replace("%20", " ")

            time.sleep(2)
            current_file_path = "E:\\Utilities_and_Apps\\Python\\MY PROJECTS\\Test data\Downloads\\" + current_file_name
            # responsible for downloading it using a path, get allows downloading, by source links
            if download:
                browser.get(download_path)

                # while the file doesn't exist/ it hasn't been downloaded yet, do nothing
                while True:
                    if os.path.exists(current_file_path):
                        break
                time.sleep(1)

            # moves the file from the download spot to its own folder
            if move:
                shutil.move(current_file_path, current_folder_path + "\\" + current_file_name)
        print(current_file_name)

    # formatter
    print("------------------------------", "", sep="\n")
    time.sleep(3)

this code works fine此代码工作正常

but not working when i change the website to https://player.hdflixcore.workers.dev/0:/Courses/Account%20Cracking%20--MrSihag/TN%20Cracking%20Course%20--MrSihag/但当我将网站更改为https://player.hdflixcore.workers.dev/0:/Courses/Account%20Cracking%20--MrSihag/TN%20Cracking%20Course%20--MrSihag/时无法正常工作

the site i used is clone of orginal site我使用的网站是原始网站的克隆

i have no idea why getting error我不知道为什么会出错

Answer 1

The issue is with the CSS selector for input box on below page.问题出在下页输入框的 CSS 选择器上。 https://player.hdflixcore.workers.dev/0:/Courses/Account%20Cracking%20--MrSihag/TN%20Cracking%20Course%20--MrSihag/01%20Course%20Introduction/1%20Course%20Introduction.mp4?a=view https://player.hdflixcore.workers.dev/0:/Courses/Account%20Cracking%20--MrSihag/TN%20Cracking%20Course%20--MrSihag/01%20Course%20Introduction/1%20Course%20Introduction.mp4 ?a=视图

There are 2 inputs boxes on the page, so you have to write CSS path as "#content > div > div:nth-child(6) > input" .页面上有 2 个输入框，因此您必须将 CSS 路径写为"#content > div > div:nth-child(6) > input" 。

Code with issue.有问题的代码。

download_path = browser.find_element_by_css_selector("input").get_attribute("value")

To be replaced with.被替换为。

download_path = browser.find_element_by_css_selector("#content > div > div:nth-child(6) > input").get_attribute("value")

在原始代码 python,selenium 中替换 url 时出现错误，重新

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-09-01 13:08:49

在原始代码 python,selenium 中替换 url 时出现错误，重新

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-09-01 13:08:49

解决方案1
0 已采纳 2020-09-01 13:08:49