[英]im using python pandas to extract some data(page titles) but outputs are not in the same order as the URLs i put in the code
So I Wrote the code and ran it and got the.xlsx file but the output is not as the same order of the Url list i put in the code.所以我编写了代码并运行它并获得了.xlsx 文件,但是 output 与我放入代码中的 Url 列表的顺序不同。
#importing the libraries
import re
import lxml
import chardet
from os import truncate
import bs4
from bs4 import BeautifulSoup
import multiprocessing
import requests
import pandas as pd
from fake_useragent import UserAgent
import numpy as np
urls = list(('https://isabad.com/advanced-professional-email-templates-opencart-extension' ,
'https://isabad.com/seo-basic-pack-opencart-extension',
'https://isabad.com/x-shipping-pro',
'https://isabad.com/bot-blocker-opencart-extension',
'https://isabad.com/opencart-mobile-application'
))
dit = {}
user_agent = UserAgent()
for url in urls:
data = requests.get(url, headers={"user-agent": user_agent.chrome})
soup = bs4.BeautifulSoup(data.content, "lxml")
dit[url] = soup.find_all("title")
ex = pd.DataFrame({"title": dit ,})
print(ex)
ex.to_excel('sasa.xlsx', index=False, engine='xlsxwriter')
How Can I fix this problem?我该如何解决这个问题?
You are using the set
data structure for storing the list of URLs and the set
data structure in python is an unordered data structure.您正在使用set
数据结构来存储 URL 列表,而 python 中的set
数据结构是无序数据结构。 To have the output in the same order, you should store the URLs in list
data structure as follows:要使 output 以相同的顺序排列,您应该将 URL 存储在list
数据结构中,如下所示:
urls = [
'https://www.sample.com/search/category-mobile/' ,
'https://www.sample.com/search/category-tablet-ebook-reader',
'https://www.sample.com/search/category-laptop/',
'https://www.sample.com/search/category-computer-parts/',
'https://www.sample.com/search/category-office-machines/'
]
Cheers!干杯!
use a list
so the results would be in the same order that you defined.使用list
,以便结果与您定义的顺序相同。
urls = ['https://www.sample.com/search/category-mobile/' ,
'https://www.sample.com/search/category-tablet-ebook-reader',
'https://www.sample.com/search/category-laptop/',
'https://www.sample.com/search/category-computer-parts/',
'https://www.sample.com/search/category-office-machines/'
]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.