[英]How to write a cleaner code while adding cookies to the webdriver in python selenium
[英]How to Write a Cleaner and performant code with Pandas while reading CSV
我正在研究CSV
數據表並想要解析和過濾數據,在處理代碼時,我發現有人在SO POST上詢問過類似的代碼,並且作者擁有與我看到的幾乎相同的硬件數據與我有一些數據和列不同的 HPE H/W 相關。
Status Server Server Name Bay # Model Processor Proc. Count Memory Serial Number State Power State iLO FW Firmware Appliance Name
Critical enc2010, bay 1 tdm2066.example.com 1 ProLiant BL460c Gen9 Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz 2 262144 2M272101N9 Unmanaged On 2.53 May 03 2017 I36 v2.40 (02/17/2017) OV C7000 enclosures 1
OK enc1011, bay 1 tdm1068.example.com 1 ProLiant BL460c Gen9 Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz 2 262144 2M272101P6 Monitored On 2.55 Aug 16 2017 I36 v2.74 (07/21/2019) OV C7000 enclosures 1
OK enc1012, bay 1 tdm1083.example.com 1 ProLiant BL460c Gen9 Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz 2 262144 2M272101NX Monitored On 2.61 Jul 27 2018 I36 v2.60 (05/21/2018) OV C7000 enclosures 1
OK ENC2004, bay 1 tdm2033.example.com 1 ProLiant BL460c Gen9 Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz 2 524288 2M262602L2 Monitored On 2.55 Aug 16 2017 I36 v2.52 (10/25/2017) OV C7000 enclosures 1
OK ENC2006, bay 1 vds2009 1 ProLiant BL460c Gen9 Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz 2 524288 2M263604ZZ Monitored On 2.40 Dec 02 2015 I36 v2.20 (05/05/2016) OV C7000 enclosures 1
OK ENC2011, bay 1 tdm2081.example.com 1 ProLiant BL460c Gen9 Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz 2 524288 2M2708027Z Monitored On 2.55 Aug 16 2017 I36 v2.52 (10/25/2017) OV C7000 enclosures 1
OK ENC1003, bay 1 tdm1024.example.com 1 ProLiant BL460c Gen9 Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz 2 524288 2M262602KW Monitored On 2.73 Feb 11 2020 I36 v2.52 (10/25/2017) OV C7000 enclosures 1
OK ENC1006, bay 1 vds1009 1 ProLiant BL460c Gen9 Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz 2 524288 2M262505V5 Monitored On 2.40 Dec 02 2015 I36 v2.00 (12/28/2015) OV C7000 enclosures 1
OK ENC1007, bay 1 vds1023 1 ProLiant BL460c Gen9 Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz 2 524288 2M264800TR Monitored On 2.50 Sep 23 2016 I36 v2.30 (09/12/2016) OV C7000 enclosures 1
df = pd.read_csv("testcreate.csv", sep="\t")
df = df[[ 'Server', 'Server Name', 'Bay #', 'Appliance Name']]
df['Bay'] = df['Server'].str.split(',').str[1].str.lower()
df['Enclosure'] = df['Server'].str.split(',').str[0].str.upper()
df['Server Name'] = df['Server Name'].str.split('.').str[0]
df = df.drop(['Server', 'Bay #'], axis=1)
df = df[df['Appliance Name'].str.contains('C7000')]
Server Name Appliance Name Bay Enclosure
0 tdm2066 OV C7000 enclosures 1 bay 1 ENC2010
1 tdm1068 OV C7000 enclosures 1 bay 1 ENC1011
2 tdm1083 OV C7000 enclosures 1 bay 1 ENC1012
3 tdm2033 OV C7000 enclosures 1 bay 1 ENC2004
4 vds2009 OV C7000 enclosures 1 bay 1 ENC2006
5 tdm2081 OV C7000 enclosures 1 bay 1 ENC2011
6 tdm1024 OV C7000 enclosures 1 bay 1 ENC1003
7 vds1009 OV C7000 enclosures 1 bay 1 ENC1006
8 vds1023 OV C7000 enclosures 1 bay 1 ENC1007
9 vds0003 OV C7000 enclosures 1 bay 1 ENT0003
10 tdm7123 OV C7000 enclosures 1 bay 1 ENC7003
11 tdm2231 OV C7000 enclosures 1 bay 1 ENC2022
12 tdm2186 OV C7000 enclosures 1 bay 1 ENC2018
13 tdm1098 OV C7000 enclosures 1 bay 1 ENC1013
14 tdm1158 OV C7000 enclosures 1 bay 1 ENC1017
15 tdm2096 OV C7000 enclosures 1 bay 1 ENC2012
16 tdm1012 OV C7000 enclosures 1 bay 1 ENC1002
17 tdm1062 OV C7000 enclosures 1 bay 1 ENC1009
18 vds1041 OV C7000 enclosures 1 bay 1 ENC1010
19 vds1001 OV C7000 enclosures 1 bay 1 ENC1005
20 vds7025 OV C7000 enclosures 1 bay 1 ENC7009
21 vds2023 OV C7000 enclosures 1 bay 1 ENC2007
22 tdm7068 OV C7000 enclosures 1 bay 1 ENC7005
23 vds7006 OV C7000 enclosures 1 bay 1 ENC7006
24 tdm2126 OV C7000 enclosures 1 bay 1 ENC2014
25 vds2001 OV C7000 enclosures 1 bay 1 ENC2005
26 tdm1173 OV C7000 enclosures 1 bay 1 ENC1018
27 tdm1250 OV C7000 enclosures 1 bay 1 ENC1025
我借用了df1 = pd.concat( [g.set_index('Bay').add_suffix(f'_{n}') for n, g in df.groupby('Enclosure')], axis=1, sort=False).filter( like='Server Name').dropna(how='all', axis=1)
<-- 這來自提到的 SO 帖子,但我沒有完全明白,我也不想添加后綴即Server Name
,因此它應該只like
Enc1002` 等。
df1 = pd.concat( [g.set_index('Bay').add_suffix(f'_{n}') for n, g in df.groupby('Enclosure')], axis=1, sort=False).filter( like='Server Name').dropna(how='all', axis=1)
print(df1)
結果:
ENC1002 ENC1003 ENC1005
bay 1 tdm1012 tdm1024 vds1001
我從@Scott 獲得了desired
解決方案的解決方案。
df = pd.concat([g.set_index('Bay')['Server Name'].rename(f'{n}') for n, g in df.groupby('Enclosure')], axis=1, sort=False)
我的代碼可能有點亂,顯示在Dataframe
下,有沒有辦法更好地編碼它,只是在這里問它以獲得更好的建議和代碼寫作..
我希望我得到了正確的。
# Read the CSV & assign it to `text`
with open('estcreate.csv', 'r') as fh:
text = fh.read()
enc = dict()
for line in text.splitlines()[1:]:
status, enclosure, bay, bay_no, vds, *na = line.split()
enclosure = enclosure.replace(',','').upper()
vds = vds.lower().split('.')[0]
if enclosure not in enc:
enc[enclosure] = dict()
if bay_no not in enc[enclosure]:
enc[enclosure][bay_no] = vds
>>> df = pd.DataFrame.from_dict(enc)
>>> df
ENC2010 ENC1011 ENC1012 ENC2004 ENC2006 ENC2011 ENC1003 ENC1006 ENC1007 ENT0003 ... ENC1010 ENC1005 ENC7009 ENC2007 ENC7005 ENC7006 ENC2014 ENC2005 ENC1018 ENC1025
1 tdm2066 tdm1068 tdm1083 tdm2033 vds2009 tdm2081 tdm1024 vds1009 vds1023 vds0003 ... vds1041 vds1001 vds7025 vds2023 tdm7068 vds7006 tdm2126 vds2001 tdm1173 tdm1250
>>> df.T
1
ENC2010 tdm2066
ENC1011 tdm1068
ENC1012 tdm1083
ENC2004 tdm2033
ENC2006 vds2009
ENC2011 tdm2081
...
如果我理解正確,您想要 pivot 您的表:
df1 = pd.pivot(df, values='Server Name', index='Bay', columns='Enclosure')
Enclosure ENC1003 ENC1006 ENC1007 ENC1011 ENC1012 ENC2004 ENC2006 ENC2010 ENC2011 ENT0003
Bay
bay 1 tdm1024 vds1009 vds1023 tdm1068 tdm1083 tdm2033 vds2009 tdm2066 tdm2081 vds0003
在閱讀了您的帖子和代碼塊之后,我在操作中看不到太多 scope 。 我想像下面這樣更改它,這將提供所需的結果。 但是,我在代碼審查中看到了類似的帖子。
1-您應該選擇所需的列,這將減少處理負擔和靈活性,您可以使用usecols
。
2-您可以將df.assign
與 Dict 一起使用,它將根據Keys, values
從兩個 collections 創建一個 dict,這將基於,
通過拆分創建兩個不同的列,因此您可以執行split
、 rename
和drop
操作在一個 go 中。
它將像下面的 go 一樣,應該可以工作。
import pandas as pd
##### Pandas setting in case you want to visualize them on the screen. ####
#pd.set_option('display.height', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('expand_frame_repr', True)
##################### END OF THE Display Settings ###################
# usecols is supposed to provide a filter before reading the whole DataFrame
# into memory; if used properly, there should never be a need to delete columns
# after reading.
df = pd.read_csv("testcreate.csv",
usecols=['Server',
'Server Name',
'Appliance Name'
]
)
df1 = df.assign(**dict
(zip
('xy',
df.Server.str.split(',')
.str
)
)
).rename(columns=
{'x': 'Enclosure',
'y': 'Bay'
}
).drop(['Server'], axis=1)
df1 = df1[
df1['Appliance Name']
.str.contains('C7000')
]
df1['Server Name'] = df1['Server Name'].str.split('.').str[0].str.lower()
df1['Enclosure'] = df1['Enclosure'].str.upper()
df1 = pd.pivot(df1,
values='Server Name',
index='Bay',
columns='Enclosure'
).rename_axis(None)
df1.to_csv("YourCsvFileName.csv")
# Print(df1)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.