[英]How Pandas read_html can get only selected columns from entire DataFrame
我正在嘗試從html
頁面中提取特定列,我的html
數據如下所示。
1)HTML DATA格式
VM Name User Name Image Name Network VCPUS Memory(GB) Disk(GB) Tenant Region KVM Host Power State URL Created
0 dbsw-powerbi anokhe@ezy.com unknown {u'VLAN181': [u'192.168.57.91']} 4 16 100 APP DBS-AP-IN dbs-appkvm03 On https://compute.ezy.com 2018-08-02T10:30:07Z
1 pciedip anokhe@ezy.com dbsVDI-RHEL65 {u'VLAN181': [u'192.168.57.37']} 4 32 200 APP DBS-AP-IN dbs-appkvm01 On https://compute.ezy.com 2018-04-18T06:39:38Z
2 dbs-spbdatasync1 anokhe@ezy.com dbsVDI-RHEL510 {u'VLAN181': [u'192.168.57.156']} 1 8 50 APP DBS-AP-IN dbs-kvm13 On https://compute.ezy.com 2018-04-05T09:51:29Z
3 dbsw-russian anokhe@ezy.com dbsVDI-WIN764-V1 {u'VLAN181': [u'192.168.57.216']} 1 4 100 APP DBS-AP-IN dbs-appkvm01 On https://compute.ezy.com 2018-04-02T06:25:25Z
4 dbs-spbdatasync anokhe@ezy.com dbsVDI-RHEL510 {u'VLAN181': [u'192.168.57.233']} 1 8 50 APP DBS-AP-IN dbs-kvm13 On https://compute.ezy.com 2018-04-02T05:03:03Z
我只是在嘗試使用熊貓read_html
來獲取DataFrame,但無法獲得從DataFrame中獲取特定列的理解。 我需要從13列中選擇列['VM Name', 'User Name', 'Network', 'Region']
。
2)代碼段
from __future__ import print_function
from signal import signal, SIGPIPE, SIG_DFL
signal(SIGPIPE,SIG_DFL)
import pandas as pd
##### Python pandas, widen output display to see more columns. ####
pd.set_option('display.height', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('expand_frame_repr', True)
# print(pd.read_excel('ssd.xlsx'))
# Data = pd.read_html('http://openstacksearch/vm_list.html', header=0, flavor='bs4', index_col=['VM Name', 'User Name', 'Network', 'Region'])
Data = pd.read_html('http://openstacksearch/vm_list.html', header=0, flavor='bs4')
print(Data[0].head())
選擇可以使用的列的子集
Data = pd.read_html('http://openstacksearch/vm_list.html', header=0, flavor='bs4')
Data = Data[['VM Name', 'User Name', 'Network', 'Region']]
我得到溶液中,同時選擇所述DataFrame
從所述加工read_html
,然后用基於多指標的方法選擇所需的列。 感謝Adrew為此提出的想法。
因此,代碼如下所示……可能對某人有所幫助
import pandas as pd
##### Python pandas, widen output display to see more columns. ####
pd.set_option('display.height', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('expand_frame_repr', True)
###### Data Extraction ##################
'''
pd.read_html returns you a list with one element and that
element is the pandas dataframe, i.e.
Data = pd.read_html('url') will produce a list
Data[0] Will return a pandas DataFrame
'''
Data = pd.read_html('http://openstacksearch/vm_list.html', header=0, flavor='bs4')[0]
Data1 = Data[['VM Name', 'User Name', 'Network', 'Region']]
print(Data1)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.