简体   繁体   English

我使用 python pandas 来提取一些数据(页面标题),但输出的顺序与我放入代码中的 URL 的顺序不同

[英]im using python pandas to extract some data(page titles) but outputs are not in the same order as the URLs i put in the code

So I Wrote the code and ran it and got the.xlsx file but the output is not as the same order of the Url list i put in the code.所以我编写了代码并运行它并获得了.xlsx 文件,但是 output 与我放入代码中的 Url 列表的顺序不同。

#importing the libraries
import re
import lxml
import  chardet
from os import truncate
import bs4
from bs4 import BeautifulSoup
import multiprocessing
import requests
import pandas as pd
from fake_useragent import UserAgent
import numpy as np

urls = list(('https://isabad.com/advanced-professional-email-templates-opencart-extension' ,
'https://isabad.com/seo-basic-pack-opencart-extension',
'https://isabad.com/x-shipping-pro',
'https://isabad.com/bot-blocker-opencart-extension',
'https://isabad.com/opencart-mobile-application'
))

dit = {}
user_agent = UserAgent()
for url in urls:
        data = requests.get(url, headers={"user-agent": user_agent.chrome})
        soup = bs4.BeautifulSoup(data.content, "lxml")
        dit[url] = soup.find_all("title")
        ex = pd.DataFrame({"title": dit ,})
        print(ex)
        ex.to_excel('sasa.xlsx', index=False, engine='xlsxwriter')


How Can I fix this problem?我该如何解决这个问题?

You are using the set data structure for storing the list of URLs and the set data structure in python is an unordered data structure.您正在使用set数据结构来存储 URL 列表,而 python 中的set数据结构是无序数据结构。 To have the output in the same order, you should store the URLs in list data structure as follows:要使 output 以相同的顺序排列,您应该将 URL 存储在list数据结构中,如下所示:

urls = [
  'https://www.sample.com/search/category-mobile/' ,
  'https://www.sample.com/search/category-tablet-ebook-reader',
  'https://www.sample.com/search/category-laptop/',
  'https://www.sample.com/search/category-computer-parts/',
  'https://www.sample.com/search/category-office-machines/'
]

Cheers!干杯!

use a list so the results would be in the same order that you defined.使用list ,以便结果与您定义的顺序相同。

urls = ['https://www.sample.com/search/category-mobile/' ,
'https://www.sample.com/search/category-tablet-ebook-reader',
'https://www.sample.com/search/category-laptop/',
'https://www.sample.com/search/category-computer-parts/',
'https://www.sample.com/search/category-office-machines/'
]

在此处输入图像描述

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我正在尝试使用 python 从 html 网站中提取一些数据 - im trying to extract some data out of html website using python 我需要使用html页面中的python提取一些数据 - I need to extract some data using python from a html page python 使用 beautifulsoup 提取具有完整 URL 的标题 - python extract titles with full urls with beautifulsoup 我想使用 python 抓取所有标题的网址 - I want to scrape urls of all titles using python 如何使用python从页面(或html)中提取标题? - how to extract titles from the page (or html) using python? Python 代码(使用 Pandas/Matplotlib)将数据文件作为输入并输出图形,但我希望图例显示在左侧而不是右侧 - Python code (using Pandas/Matplotlib) takes in data file as input and outputs a graph but I want the legend displayed on the left instead of the right 如何使用 Python 3 和 Pandas 从多个 Excel 工作表中提取相同的行号并将其放在一起? - How can I use Python 3 and pandas to extract and put together same row numbers from multiple excel sheets? 我在centos中使用以下代码在python-pandas中读取excel文件,但我收到了错误 - im reading the excel file in python-pandas using the below code in centos but im getting error 如何从pdf提取页面并将其同时放入zip中Python - How can I extract a page from pdf and put it into zip at the same time Python 使用Pandas在Python中提取相同的命名值 - Extract same named values in Python using Pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM