簡體   English   中英

使用 Python 下載 *.mp4 文件

[英]Downloading *.mp4 files with Python

我正在嘗試從網站下載並保存講座視頻。 雖然我已成功下載文件,但它們無法在我的媒體播放器中播放。 這是我正在使用的代碼:

from bs4 import BeautifulSoup
import re
import urllib2

snippet = open('Python/SNA Page Source Revised.txt', 'r')
soup = BeautifulSoup(snippet)

links = [link.get('href') for link in soup.find_all('a')]

videos = []

for link in links:
  match = re.search('.*mp4.*', link)
  if match:
    videos.append(link)

vidNum = 1

for video in videos:
  f = urllib2.urlopen(video)
  with open('Data Analysis/Social Network Analysis/Video '+vidNum+'.mp4', 'wb') as code:
    code.write(f.read())
  vidNum += 1

一切似乎都正常,但當我嘗試播放其中一個視頻時,出現此錯誤:“Python (v2.7) 需要安裝插件才能播放以下類型的媒體文件:text/html 解碼器”此外,如果我手動從網站下載視頻,文件大約為 22.8MB,但當我使用我的腳本時,文件只有 7.8kB。

我下載文件的方式有問題嗎? 任何幫助將不勝感激。

另外:我正在使用 Python v2.7 在 Ubuntu 12.04 LTS 操作系統上運行。

****編輯* ***

這是我根據收到的回復使用的代碼:

import requests

r = requests.get('https://class.coursera.org/sna-003/lecture/download.mp4?lecture_id=2', auth=('myUsername', 'myPassword'))

with open('Data Analysis/TestFile.mp4', 'wb') as fd:
  fd.write(r.content)

這是r.content的output:

<!DOCTYPE html>
<html itemtype="http://schema.org" xmlns:fb="http://ogp.me/ns/fb#"><head><meta content="IE=Edge,chrome=IE7" http-equiv="X-UA-Compatible"/><meta content="!" name="fragment"/><meta content="NOODP" name="robots"/><meta charset="utf-8"/><meta content="Coursera" property="og:title"/><meta content="website" property="og:type"/><meta content="http://s3.amazonaws.com/coursera/media/Coursera_Computer_Narrow.png" property="og:image"/><meta content="https://www.coursera.org/" property="og:url"/><meta content="Coursera" property="og:site_name"/><meta content="en_US" property="og:locale"/><meta content="Take free online classes from 80+ top universities and organizations. Coursera is a social entrepreneurship company partnering with Stanford University, Yale University, Princeton University and others around the world to offer courses online for anyone to take, for free. We believe in connecting people to a great education so that anyone around the world can learn without limits." property="og:description"/><meta content="727836538,4807654" property="fb:admins"/><meta content="274998519252278" property="fb:app_id"/><meta content="Take free online classes from 80+ top universities and organizations. Coursera is a social entrepreneurship company partnering with Stanford University, Yale University, Princeton University and others around the world to offer courses online for anyone to take, for free. We believe in connecting people to a great education so that anyone around the world can learn without limits." name="description"/><meta content="http://s3.amazonaws.com/coursera/media/Coursera_Computer_Narrow.png" name="image"/><meta content="app-id=736535961" name="apple-itunes-app"/><script>window.onerror = function(message, url, lineNum) {

  // First check the URL and line number of the error
  url = url || window.location.href;
  // 99% of the time, errors without line numbers arent due to our code,
  // they are due to third party plugins and browser extensions
  if (lineNum === undefined || lineNum == null) return;

  // Now figure out the actual error message
  // If it's an event, as triggered in several browsers
  if (message.target &amp;&amp; message.type) {
    message = message.type;
  }
  if (!message.indexOf) {
    message = 'Non-string, non-event error: ' + (typeof message);
  }

  var errorDescrip = {
    message: message,
    script: url,
    line: lineNum,
    url: document.URL
  }

  var err = {
    key: 'page.error.javascript', 
    value: errorDescrip
  }

  window._204 = window._204 || [];
  window._204.push(err);

  window._gaq = window._gaq || [];
  window._gaq.push(err);
}</script><title>Coursera.org</title><link href="https://d1rlkby5e91r2j.cloudfront.net/e47434615f57601f9b9ccaf255a589e8550d328d/css/home.css" rel="stylesheet" type="text/css"/><link href="https://d1rlkby5e91r2j.cloudfront.net/e47434615f57601f9b9ccaf255a589e8550d328d/pages/auth/css/auth.css" rel="stylesheet" type="text/css"/><script data-baseurl="https://d1rlkby5e91r2j.cloudfront.net/e47434615f57601f9b9ccaf255a589e8550d328d/" id="_mobile">(function(el) {
  // Override certian behaviour if the page is for our mobile app.
  // TODO(priya) Remove this conditional behaviour once I want to push this behaviour
  // for regular authentication pages on mobile/smaller screens as well.
  // Currently I'm keeping existing behaviour same and only adding mobile specific
  // layouts ot /mobilesignup page (which is what isMobileApp = true signifies).
  if ("false" == "true") {
    var head = document.getElementsByTagName('head')[0];
    // Add viewport meta tag
    var viewport = document.querySelector('meta[name=viewport]');
    var viewportContent = 'width=device-width, initial-scale=1.0, user-scalable=no';
    if (!viewport) {
        viewport = document.createElement('meta');
        viewport.setAttribute('name', 'viewport');
        head.appendChild(viewport);
    }
    viewport.setAttribute('content', viewportContent);

    // Add responsive css
    var link  = document.createElement('link');
    link.rel  = 'stylesheet';
    link.type = 'text/css';
    link.href = el.getAttribute("data-baseurl") + "pages/auth/css/auth_responsive.css";
    head.appendChild(link);
  }
})(document.getElementById("_mobile"));
</script></head><body><div id="fb-root"></div><div id="origami"><div style="position:absolute;top:0px;left:0px;width:100%;height:100%;background:#f5f5f5;padding-top:5%;"><div id="coursera-loading-nojs" style="text-align:center; margin-bottom:10px;display:none;">Please use a <a href="/browsers">modern browser </a> with JavaScript enabled to use Coursera.</div><div><span id="coursera-loading-js" style="display: none; padding-left:45%">loading   <img src="https://d2wvvaown1ul17.cloudfront.net/site-static/images/icons/loading.gif"/></span></div><noscript><div style="text-align:center; margin-bottom:10px;">Please use a <a href="/browsers">modern browser </a> with JavaScript enabled to use Coursera.</div></noscript></div></div><!--[if gte IE 8]&gt;&lt;script&gt;document.getElementById("coursera-loading-js").style.display = 'block';&lt;/script&gt;&lt;![endif]-->
<!--[if lte IE 7]&gt;&lt;script&gt;document.getElementById("coursera-loading-nojs").style.display = 'block';
window._204 = window._204 || [];
window._gaq = window._gaq || [];

window._gaq.push(
    ['_setAccount', 'UA-28377374-1'],
    ['_setDomainName', window.location.hostname],
    ['_setAllowLinker', true],
    ['_trackPageview', window.location.pathname]);

window._204.push(
  ['client', 'home'],
  {key:"pageview", value:window.location.pathname});
  &lt;/script&gt;&lt;script src="https://eventing.coursera.org/204.min.js"&gt;&lt;/script&gt;&lt;script src="https://ssl.google-analytics.com/ga.js"&gt;&lt;/script&gt;&lt;![endif]-->
<!--[if !IE]&gt; --><script>document.getElementById("coursera-loading-js").style.display = 'block';</script><!-- &lt;![endif]--><script src="https://d1rlkby5e91r2j.cloudfront.net/e47434615f57601f9b9ccaf255a589e8550d328d/js/core/require.js" type="text/javascript"></script><script data-baseurl="https://d1rlkby5e91r2j.cloudfront.net/e47434615f57601f9b9ccaf255a589e8550d328d/" data-debug="0" data-locale="" data-timestamp="1386838999742" data-version="e47434615f57601f9b9ccaf255a589e8550d328d" id="_require" type="text/javascript">if(document.getElementById("coursera-loading-js").style.display == 'block') {
  (function(el) {
     // prevent throw
     require.onError = function(err) {
       window._204 = window._204 || [];
       window._204.push({key: 'requireErr', value: err});
     };

     define("pages/auth/authConfig",
         function() {
             return {"coursera_url": "https://www.coursera.org/",
                     "environment": "production"};
     }
     );

     require.config({
       enforceDefine: false,
       waitSeconds: 14,
       baseUrl: el.getAttribute("data-baseurl"),
       urlArgs: el.getAttribute("data-debug") == "1" ? "v=" + el.getAttribute("data-timestamp") : "",
       shim: {
          "underscore": {
             exports: '_'
          },
          "backbone": {
             deps: ['underscore', 'jquery'],
             exports: 'Backbone'
          }
       },
       paths: {
          "jquery":       "js/core/jquery",
          "underscore":   "js/core/underscore",
          "backbone":     "js/core/backbone",
          "i18n":         "js/core/i18n._t"
       },
       callback: function() {
         require(["pages/auth/routes"]); // bootup coursera
       },
       config: {
         i18n: {
           locale: (window.localStorage ? localStorage.getItem("locale") : '') || el.getAttribute("data-locale")
         }
       }
     });
  })(document.getElementById("_require"));
}</script><script type="text/javascript">define("pages/home/models/user.json", [], function(){
  return null;
});
</script></body></html>

不過,我覺得這很奇怪,因為它看起來就像網站的源代碼,但是當我查看 r.url 時,我得到了一個可以在瀏覽器中加載的實際網站,它會提示我保存或查看視頻。 即使當我嘗試傳遞新的 url 時,我認為它包含我的 cookie 信息,我仍然得到相同的內容。 我不明白我哪里出錯了。

首先,下載並安裝請求 package

然后使用這段代碼:

import requests

def downloadfile(name,url):
    name=name+".mp4"
    r=requests.get('url')
    print "****Connected****"
    f=open(name,'wb');
    print "Donloading....."
    for chunk in r.iter_content(chunk_size=255): 
        if chunk: # filter out keep-alive new chunks
            f.write(chunk)
    print "Done"
    f.close()

您需要有一個有效的 cookie,這樣您就不會下載登錄頁面。

這是在 urllib2 上設置 cookies 的方法

import urllib2
opener = urllib2.build_opener()
opener.addheaders.append(('Cookie', 'cookiename=cookievalue'))
f = opener.open("http://example.com/")

此外,您還可以使用cookielib獲得更多類似 web 瀏覽器的行為,以進行登錄過程並獲取正確的 cookie 來下載您的電影。

另一種方法是使用類似 urllib2 的Requests來創建自動登錄過程,這樣更容易。

我首先將文件保存為 .html 而不是 .mp4,這樣您可以 100% 確定它不是登錄頁面/錯誤頁面或其他雜項垃圾。 一些網站需要 cookies、特定的用戶代理(以阻止機器人/抓取工具/自動漏洞掃描器)、推薦人等等。

我個人使用篡改數據或實時 http 標頭來確保我的程序在調試時正常工作。

如果您收到雲端響應,那么您可能沒有正確處理 cookies/user-agents/refferer。

我剛剛檢查了鏈接,還有一個 CSRF cookie {csrf_token=toNQOP7stgOREzrDcbPc},您將 100% 需要它來查看通過登錄頁面的任何內容。

你也可以用Curl下載MP4視頻如果你有鏈接,這樣就容易多了

導入操作系統

os.system(f"curl {your URL link} --output c:/Users/Desktop/yourFile.mp4")

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM