簡體   English   中英

熊貓read_html如何只能從整個DataFrame中獲取選定的列

[英]How Pandas read_html can get only selected columns from entire DataFrame

我正在嘗試從html頁面中提取特定列,我的html數據如下所示。

1)HTML DATA格式

            VM Name           User Name        Image Name                           Network  VCPUS  Memory(GB)  Disk(GB) Tenant     Region      KVM Host Power State                          URL               Created
0      dbsw-powerbi  anokhe@ezy.com           unknown   {u'VLAN181': [u'192.168.57.91']}      4          16       100    APP  DBS-AP-IN  dbs-appkvm03          On  https://compute.ezy.com  2018-08-02T10:30:07Z
1           pciedip  anokhe@ezy.com     dbsVDI-RHEL65   {u'VLAN181': [u'192.168.57.37']}      4          32       200    APP  DBS-AP-IN  dbs-appkvm01          On  https://compute.ezy.com  2018-04-18T06:39:38Z
2  dbs-spbdatasync1  anokhe@ezy.com    dbsVDI-RHEL510  {u'VLAN181': [u'192.168.57.156']}      1           8        50    APP  DBS-AP-IN     dbs-kvm13          On  https://compute.ezy.com  2018-04-05T09:51:29Z
3      dbsw-russian  anokhe@ezy.com  dbsVDI-WIN764-V1  {u'VLAN181': [u'192.168.57.216']}      1           4       100    APP  DBS-AP-IN  dbs-appkvm01          On  https://compute.ezy.com  2018-04-02T06:25:25Z
4   dbs-spbdatasync  anokhe@ezy.com    dbsVDI-RHEL510  {u'VLAN181': [u'192.168.57.233']}      1           8        50    APP  DBS-AP-IN     dbs-kvm13          On  https://compute.ezy.com  2018-04-02T05:03:03Z

我只是在嘗試使用熊貓read_html來獲取DataFrame,但無法獲得從DataFrame中獲取特定列的理解。 我需要從13列中選擇列['VM Name', 'User Name', 'Network', 'Region']

2)代碼段

from __future__ import print_function
from signal import signal, SIGPIPE, SIG_DFL
signal(SIGPIPE,SIG_DFL)
import pandas as pd
##### Python pandas, widen output display to see more columns. ####
pd.set_option('display.height', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('expand_frame_repr', True)

# print(pd.read_excel('ssd.xlsx'))
# Data = pd.read_html('http://openstacksearch/vm_list.html', header=0, flavor='bs4', index_col=['VM Name', 'User Name', 'Network', 'Region'])
Data = pd.read_html('http://openstacksearch/vm_list.html', header=0, flavor='bs4')
print(Data[0].head())

選擇可以使用的列的子集

Data = pd.read_html('http://openstacksearch/vm_list.html', header=0, flavor='bs4')
Data = Data[['VM Name', 'User Name', 'Network', 'Region']]

我得到溶液中,同時選擇所述DataFrame從所述加工read_html ,然后用基於多指標的方法選擇所需的列。 感謝Adrew為此提出的想法。

因此,代碼如下所示……可能對某人有所幫助

import pandas as pd
##### Python pandas, widen output display to see more columns. ####
pd.set_option('display.height', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('expand_frame_repr', True)
###### Data Extraction ##################
'''
pd.read_html returns you a list with one element and that 
element is the pandas dataframe, i.e.
Data = pd.read_html('url') will produce a list
Data[0]  Will return a pandas DataFrame
'''
Data = pd.read_html('http://openstacksearch/vm_list.html', header=0, flavor='bs4')[0]
Data1 = Data[['VM Name', 'User Name', 'Network', 'Region']]
print(Data1)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM