熊貓read_html如何只能從整個DataFrame中獲取選定的列

Question

我正在嘗試從html頁面中提取特定列，我的html數據如下所示。

1）HTML DATA格式

            VM Name           User Name        Image Name                           Network  VCPUS  Memory(GB)  Disk(GB) Tenant     Region      KVM Host Power State                          URL               Created
0      dbsw-powerbi  anokhe@ezy.com           unknown   {u'VLAN181': [u'192.168.57.91']}      4          16       100    APP  DBS-AP-IN  dbs-appkvm03          On  https://compute.ezy.com  2018-08-02T10:30:07Z
1           pciedip  anokhe@ezy.com     dbsVDI-RHEL65   {u'VLAN181': [u'192.168.57.37']}      4          32       200    APP  DBS-AP-IN  dbs-appkvm01          On  https://compute.ezy.com  2018-04-18T06:39:38Z
2  dbs-spbdatasync1  anokhe@ezy.com    dbsVDI-RHEL510  {u'VLAN181': [u'192.168.57.156']}      1           8        50    APP  DBS-AP-IN     dbs-kvm13          On  https://compute.ezy.com  2018-04-05T09:51:29Z
3      dbsw-russian  anokhe@ezy.com  dbsVDI-WIN764-V1  {u'VLAN181': [u'192.168.57.216']}      1           4       100    APP  DBS-AP-IN  dbs-appkvm01          On  https://compute.ezy.com  2018-04-02T06:25:25Z
4   dbs-spbdatasync  anokhe@ezy.com    dbsVDI-RHEL510  {u'VLAN181': [u'192.168.57.233']}      1           8        50    APP  DBS-AP-IN     dbs-kvm13          On  https://compute.ezy.com  2018-04-02T05:03:03Z

我只是在嘗試使用熊貓read_html來獲取DataFrame，但無法獲得從DataFrame中獲取特定列的理解。 我需要從13列中選擇列['VM Name', 'User Name', 'Network', 'Region'] 。

2）代碼段

from __future__ import print_function
from signal import signal, SIGPIPE, SIG_DFL
signal(SIGPIPE,SIG_DFL)
import pandas as pd
##### Python pandas, widen output display to see more columns. ####
pd.set_option('display.height', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('expand_frame_repr', True)

# print(pd.read_excel('ssd.xlsx'))
# Data = pd.read_html('http://openstacksearch/vm_list.html', header=0, flavor='bs4', index_col=['VM Name', 'User Name', 'Network', 'Region'])
Data = pd.read_html('http://openstacksearch/vm_list.html', header=0, flavor='bs4')
print(Data[0].head())

Answer 1

選擇可以使用的列的子集

Data = pd.read_html('http://openstacksearch/vm_list.html', header=0, flavor='bs4')
Data = Data[['VM Name', 'User Name', 'Network', 'Region']]

Answer 2

我得到溶液中，同時選擇所述DataFrame從所述加工read_html ，然后用基於多指標的方法選擇所需的列。 感謝Adrew為此提出的想法。

因此，代碼如下所示……可能對某人有所幫助

import pandas as pd
##### Python pandas, widen output display to see more columns. ####
pd.set_option('display.height', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('expand_frame_repr', True)
###### Data Extraction ##################
'''
pd.read_html returns you a list with one element and that 
element is the pandas dataframe, i.e.
Data = pd.read_html('url') will produce a list
Data[0]  Will return a pandas DataFrame
'''
Data = pd.read_html('http://openstacksearch/vm_list.html', header=0, flavor='bs4')[0]
Data1 = Data[['VM Name', 'User Name', 'Network', 'Region']]
print(Data1)

熊貓read_html如何只能從整個DataFrame中獲取選定的列

問題描述

2 個解決方案

解決方案1
1 已采納 2018-08-29 16:38:06

解決方案2
1 2018-08-30 06:10:56

熊貓read_html如何只能從整個DataFrame中獲取選定的列

問題描述

2 個解決方案

解決方案1 1 已采納 2018-08-29 16:38:06

解決方案2 1 2018-08-30 06:10:56

解決方案1
1 已采納 2018-08-29 16:38:06

解決方案2
1 2018-08-30 06:10:56