我们如何通过使用python代码从网站获取价值？

Question

import urllib
from datetime import date,timedelta
import datetime
import re
list =["infy.ns","grasim.ns","idea.ns","asianpain.ns","bajaj-auto-eq.ns",
       "drreddy.ns","boschltd.ns","kotakbank.ns","M&M.ns","ultracemc.ns",
       "sunpharma.ns","lt.ns","acc.ns","sbin.ns","bhartiartl.ns",
       "lupin.ns","reliance.ns","hdfcbank.ns","zeel.ns","ntpc.ns",
       "icicibank.ns","cipla.ns","tcs.ns","bpcl.ns","heromotoc.ns"]
i=0
while i<len(list):
    url="http://finance.yahoo.com/q?s="+list[i]+"&ql=1"
    htmlfile = urllib.urlopen(url)
    htmltext=htmlfile.read()
    regex='<span id="yfs_l84_'+list[i]+'">(.+?)</span>'
    pattern = re.compile(regex)
    price = re.findall(pattern,htmltext)
    print(price)
    i=i+1

当我使用终端运行该代码时，我必须从Finance.yahoo.com中获取价值，然后我在终端上获得了所有价值，但我想将该价值放入我的桌面文本文件中

Answer 1

最简单的方法不需要编码。 只需将脚本的输出重定向到文件，例如

python yahoo_scraper.py > prices.txt

要么

python yahoo_scraper.py >> prices.txt

附加到现有文件。

在Python中完成它也很容易。 打开文件进行写入并写入：

with open('prices.txt', 'w') as price_file:
    i=0
    while i<len(list):
        url="http://finance.yahoo.com/q?s="+list[i]+"&ql=1"
        htmlfile = urllib.urlopen(url)
        htmltext=htmlfile.read()
        regex='<span id="yfs_l84_'+list[i]+'">(.+?)</span>'
        pattern = re.compile(regex)
        price = re.findall(pattern,htmltext)
        print(price, file=price_file)
        i=i+1

请注意，每次运行脚本时都会覆盖该文件。 如果要追加到文件的末尾，请将'w'替换为'a' ，以追加模式打开它。

你的while循环最好写成for循环。 这是一个例子 - 我假设list被重命名为stocks以避免影响内置list ：

stocks = ["infy.ns","grasim.ns",....]

with open('prices.txt', 'w') as price_file:
    for stock in stocks:
        url = "http://finance.yahoo.com/q?s={}&q1=1".format(stock)
        html = urllib.urlopen(url).read()
        pattern = r'<span id="yfs_l84_{}>(.+?)</span>'.format(stock)
        price = re.findall(pattern, html)
        print(price, file=price_file)

您可能需要更改最后一行以打印re.findall()返回的列表的第一个元素。

我们如何通过使用python代码从网站获取价值？

问题描述

1 个解决方案

解决方案1
0 2016-03-21 10:01:02

我们如何通过使用python代码从网站获取价值？

问题描述

1 个解决方案

解决方案1 0 2016-03-21 10:01:02

解决方案1
0 2016-03-21 10:01:02