[英]python pandas pivot dataframe removing duplicates
Let's assume we have a denormalized board of Servers hostname, ip (1 hostname --> 1 ip, 1-to-1 relationship) with N oracle clients installed on it.假设我们有一个非规范化的服务器主机名板,ip(1 个主机名 --> 1 个 ip,一对一关系),在客户端上安装了 N 个 ZA189C633D9995E11BF8607170ECZA9。
col_server = ['server_A','server_A','server_A']
col_ip = ['ip_A' , 'ip_A' , 'ip_A' ]
col_ora_client = ['11' ,'12' ,'19' ]
df = pd.DataFrame(data=list(zip(col_server,col_ip,col_ora_client)) , columns=["server","ip","ora_client"])
print(tabulate(df, headers='keys', tablefmt='psql'))
throws this output抛出这个 output
+----+----------+------+--------------+
| | server | ip | ora_client |
|----+----------+------+--------------|
| 0 | server_A | ip_A | 11 |
| 1 | server_A | ip_A | 12 |
| 2 | server_A | ip_A | 19 |
+----+----------+------+--------------+
But what I want is但我想要的是
+----------+------+----+----+----+
| server | ip | 11 | 12 | 19 |
+----------+------+----+----+----+
| server_A | ip_A | 1 | 1 | 1 |
+----------+------+----+----+----+
I've tried pd.crosstab, such as我试过 pd.crosstab,比如
df_b = pd.crosstab([df['server'] , df['ip']] , df['ora_client'])
print(tabulate(df_b, headers='keys', tablefmt='psql'))
and I get an undesired first column of tuples我得到了不想要的第一列元组
+----------------------+------+------+------+
| | 11 | 12 | 19 |
|----------------------+------+------+------|
| ('server_A', 'ip_A') | 1 | 1 | 1 |
+----------------------+------+------+------+
How can I achieve my needs?我怎样才能满足我的需求?
Any help shall be much appreciated!任何帮助将不胜感激!
You can use pivot_table
:您可以使用pivot_table
:
pd.pivot_table(
df,
index=['server', 'ip'],
columns=['ora_client'],
values=['ora_client'],
aggfunc='size'
).reset_index()
#ora_client server ip 11 12 19
#0 server_A ip_A 1 1 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.