简体   繁体   English

第二个python代码bs4并将数据保存到文本文件

[英]Second python code bs4 and saving data to a text file

I have managed to write my 2nd python code and I am trying to extract data from a table on a webpage ie www.ksmnet.org.我已经设法编写了我的第二个 python 代码,我正在尝试从网页上的表格中提取数据,即 www.ksmnet.org。 I need the data of the 2nd column of the table which is today's date and I have managed to extract that fine.我需要表的第二列的数据,即今天的日期,并且我已经成功提取了该数据。 However I need the data of the 1st column of the data to be saved as a text file containing the data of the 2nd column.但是我需要将数据的第一列的数据保存为包含第二列数据的文本文件。 So for example if Fajr is 05:00 then I need a text file to be saved as Fajr.txt and inside this text file I need 05:00.例如,如果 Fajr 是 05:00,那么我需要一个文本文件保存为 Fajr.txt,在这个文本文件中我需要 05:00。

I understand some of the times are not with the ":" symbol and I need to convert them.我知道有些时候没有“:”符号,我需要转换它们。 So for example the ones with 06.00 need to be 06:00.因此,例如带有 06.00 的那些需要是 06:00。

Here is my code:这是我的代码:

# import libraries
import json
import urllib.request
#import soupsieve
from bs4 import BeautifulSoup
import requests

url = 'https://www.ksmnet.org/'
request = urllib.request.Request(url)
response = urllib.request.urlopen(request)
html = response.read()
soup = BeautifulSoup(html.decode("utf-8"), "html.parser")
path = '/srv/docker/homeassistant/prayer/'


table = soup.find('div', id={'prayer': 'listing sortable'})
package = '' ; version = ''
for i in table.select('tr'):
    data = i.select('td')
    if data:
        package = data[0].text.strip()
        version = ' '.join(data[1].text.strip().split())
        print(version)

Can anyone please help?有人可以帮忙吗?

Thanks谢谢

Here is the code to write in a text file.这是要写入文本文件的代码。

import urllib.request
#import soupsieve
from bs4 import BeautifulSoup
import requests

url = 'https://www.ksmnet.org/'
request = urllib.request.Request(url)
response = urllib.request.urlopen(request)
html = response.read()
soup = BeautifulSoup(html.decode("utf-8"), "html.parser")
path = '/srv/docker/homeassistant/prayer/'


table = soup.find('div', id={'prayer': 'listing sortable'})
package = '' ; version = ''
for i in table.select('tr'):
    data = i.select('td')
    if data:
        package = data[0].text.strip()
        file = open(package[:-1] +".txt", "w+")
        version =data[1].text.strip().replace('.',':')
        file.write(version)
        file.close()

OR you can use python pandas .或者你可以使用 python pandas

import pandas as pd
url = 'https://www.ksmnet.org/'
df=pd.read_html(url)[0]
for pkg, version in zip(df['Date'],df['06/01']):
    file = open(pkg[:-1] +".txt", "w+")
    version =version.replace('.',':').strip()
    file.write(version)
    file.close()

UPDATE checking digits更新校验位

import urllib.request
#import soupsieve
from bs4 import BeautifulSoup
import requests

url = 'https://www.ksmnet.org/'
request = urllib.request.Request(url)
response = urllib.request.urlopen(request)
html = response.read()
soup = BeautifulSoup(html.decode("utf-8"), "html.parser")
path = '/srv/docker/homeassistant/prayer/'


table = soup.find('div', id={'prayer': 'listing sortable'})
package = '' ; version = ''
for i in table.select('tr'):
    data = i.select('td')
    if data:
        package = data[0].text.strip()
        file = open(package[:-1] +".txt", "w+")
        version =data[1].text.strip().replace('.',':')

        #Check hours
        checkHours = version.split(':')[0]
        if len(checkHours) < 2:
            version ="0" + str(checkHours) +':' + version.split(':')[-1]
           # print(version)
        #Check minutes
        checkMinute = version.split(':')[-1]
        if len(checkMinute) < 2:
            version = version.split(':')[0] + ":" + "0" + str(checkMinute)
            print(version)

        file.write(version)
        file.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM