简体   繁体   English

从网站 pandas read_html 提取数据

[英]extract data from website pandas read_html

I'm trying to extract data from website URL我正在尝试从网站URL中提取数据

The table has a span tag which is messing the data extraction, the table value is concatenated with the span tag, I want to extract both the cell content and span tag in separate cells, any help would be greatly appreciated该表有一个跨度标签,它正在搞乱数据提取,表值与跨度标签连接,我想在单独的单元格中提取单元格内容和跨度标签,任何帮助将不胜感激

Here is the code这是代码

import pandas as pd

url = "https://www.sqimway.com/lte_band.php"

lte_band = pd.read_html(url)

lte_band[0]

在此处输入图像描述

If you have pandas 0.24+, you can use pandas.MultiIndex.to_flat_index() and then map out unique values to each column name.如果你有 pandas 0.24+,你可以使用pandas.MultiIndex.to_flat_index()然后 map 为每个列名输出唯一值。

# Set a new DataFrame variable.
df = lte_band[0]

# Note: We will have to sort on the tuple index to retain order.
df.columns = list(map(lambda q: " ".join(sorted(set(q), key = q.index)), df.columns.to_flat_index()))

Output of df.columns : df.columns 的df.columns

Index(['Band', 'Name', 'Mode', 'Downlink (MHz) Low Earfcn',
       'Downlink (MHz) Middle Earfcn', 'Downlink (MHz) High Earfcn',
       'BandwidthDL/UL (MHz)', 'Uplink (MHz) Low Earfcn',
       'Uplink (MHz) Middle Earfcn', 'Uplink (MHz) High Earfcn',
       'Duplex spacing(MHz)', 'Geographicalarea', '3GPPrelease',
       'Channel bandwidth (MHz) 1.4', 'Channel bandwidth (MHz) 3',
       'Channel bandwidth (MHz) 5', 'Channel bandwidth (MHz) 10',
       'Channel bandwidth (MHz) 15', 'Channel bandwidth (MHz) 20'],
      dtype='object')

Formatted:格式化:

Band
Name
Mode
Downlink (MHz) Low Earfcn
Downlink (MHz) Middle Earfcn
Downlink (MHz) High Earfcn
BandwidthDL/UL (MHz)
Uplink (MHz) Low Earfcn
Uplink (MHz) Middle Earfcn
Uplink (MHz) High Earfcn
Duplex spacing(MHz)
Geographicalarea
3GPPrelease
Channel bandwidth (MHz) 1.4
Channel bandwidth (MHz) 3
Channel bandwidth (MHz) 5
Channel bandwidth (MHz) 10
Channel bandwidth (MHz) 15
Channel bandwidth (MHz) 20

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM