简体   繁体   English

无法将名称和日期与字典分开以将其写入Excel文件

[英]Can't seperate names and dates from a dictionary in order to write them in an excel file

I've created a script in python to parse the business names and their dates from a webpage and write them to an excel file using openpyxl. 我已经在python中创建了一个脚本,以从网页中解析商家名称及其日期,然后使用openpyxl将其写入Excel文件。 My intention is to place the names and dates in seperate columns like name1 date1 name2 date2 and so on. 我的意图是将名称和日期放在单独的列中,例如name1 date1 name2 date2等等。

My current attempt can fetch the content in a dictionary and produces the result like below: 我当前的尝试可以获取字典中的内容,并产生如下结果:

{'NATIONAL OPERA STUDIO': '18 Nov 2010', 'NATIONAL THEATRE BALLET SCHOOL': '12 Aug 2005', 'NATIONAL THEATRE DRAMA SCHOOL': '12 Aug 2005', 'NATIONAL THEATRE': '30 Mar 2000'}

How can I place the names and dates in an excel file like following? 如何将名称和日期放在如下所示的excel文件中?

column1                 column2       column3                           column4  
NATIONAL OPERA STUDIO   18 Nov 2010   NATIONAL THEATRE BALLET SCHOOL    12 Aug 2005

This is my try so far: 到目前为止,这是我的尝试:

import re
import requests
from bs4 import BeautifulSoup
from openpyxl import load_workbook

wb = load_workbook('container.xlsx')
ws = wb['Sheet1']

url = "https://abr.business.gov.au/ABN/View?id=78007306283"

response = requests.get(url)
soup = BeautifulSoup(response.text,'lxml')
try:
    names_n_dates = {item.find("a").get_text(strip=True):' '.join(item.find("a").find_parent().find_next_sibling().text.split()) for item in soup.find("th",text=re.compile("Business name",re.I)).find_parent().find_next_siblings("tr")}
except AttributeError: names_n_dates = ""

items = {k:v for k,v in names_n_dates.items()}
print(items)

ws.append([items.split()])
wb.save("container.xlsx")

I know I can't apply split function on dictionary but I don't know any alternative option eiter to achieve the same. 我知道我不能在字典上应用拆分功能,但是我不知道其他选择可以达到同样的目的。 I used ws.append([]) to include the fields in an excel file and I wish to keep this command as it is because there are other fields to include within it later. 我使用ws.append([])将字段包含在excel文件中,但我希望保留此命令,因为以后还会包含其他字段。

If you want to keep ws.append() as you appear to intend (appending one list as one row), then do this: 如果您想要保持ws.append()看起来像想要的那样(将一个列表追加为一行),请执行以下操作:

import re
import requests
from bs4 import BeautifulSoup
from openpyxl import load_workbook

wb = load_workbook('container.xlsx')
ws = wb['Sheet1']

url = "https://abr.business.gov.au/ABN/View?id=78007306283"

response = requests.get(url)
soup = BeautifulSoup(response.text,'lxml')
try:
    names_n_dates = {item.find("a").get_text(strip=True):' '.join(item.find("a").find_parent().find_next_sibling().text.split()) for item in soup.find("th",text=re.compile("Business name",re.I)).find_parent().find_next_siblings("tr")}
except AttributeError: names_n_dates = ""

row = []

for item in names_n_dates.items():
   for column in item:
       row.append(column)

ws.append(row)

wb.save("container.xlsx")

To solve this you can iterate over the dictionary items that are tuples of (key, value), and then get the key and value of each one of those items like a list. 为了解决这个问题,您可以遍历作为(key,value)元组的字典项,然后获取每个项(例如列表)的键和值。 The key is in position 0 of the item and value is in position 1. 键位于项目的位置0,值位于位置1。

import re
import requests
from bs4 import BeautifulSoup
from openpyxl import load_workbook

wb = load_workbook('container.xlsx')
ws = wb['Sheet1']

url = "https://abr.business.gov.au/ABN/View?id=78007306283"

response = requests.get(url)
soup = BeautifulSoup(response.text,'lxml')
try:
    names_n_dates = {item.find("a").get_text(strip=True):' '.join(item.find("a").find_parent().find_next_sibling().text.split()) for item in soup.find("th",text=re.compile("Business name",re.I)).find_parent().find_next_siblings("tr")}
except AttributeError: names_n_dates = ""

row = []

for item in dict.items(): #iterate over all dict items
   row.append(item[0]) #key
   row.append(item[1]) #value

ws.append(row)

wb.save("container.xlsx")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM