[英]Reformat pandas DataFrame
I have a pandas
. 我有一只
pandas
。 DataFrame
with the following data: 具有以下数据的
DataFrame
:
country branch Name salary mobile no emailid
x a aa 250000 Null Null
x b bb 350000 8976646410 xx@xx.com
y c cc 450000 8777945411 yy@yy.com
y d dd 589630 Null Null
Depending on certain criteria, I filter the DataFrame
(pseudocode): 根据某些条件,我过滤了
DataFrame
(伪代码):
if salary <= 250000: Normal Employee elif salary >= 250000 and salary <= 600000: Experienced Employee
In doing this, I add a new column as follows: 为此,我添加了一个新列,如下所示:
normal = data_df['salary'] <= 250000
experienced = (data_df['salary'] > 250000) & \
(data_df['customer_total_sales'] <= 600000)
data_df['position'] = np.where(normal, 'normal',
np.where(experienced, 'experienced','unknown'))
Yet, I would like to display the DataFrame
as follows, removing rows with the value Null
: 但是,我想按如下所示显示
DataFrame
,删除值为Null
行:
country branch count_employee count_mobile_no count_email_id count_normal _employee count_experienced_employee
x a 1 0 0 1 0
y c 1 1 1 0 1
To count fields, I use the following code: 要计算字段,我使用以下代码:
a = {'employee': ['count'],
'mobile_number': ['count'],
'customer_emailid': ['count']}
You can replace
Null
to NaN
and then groupby
with agg
and last reset_index
: 您可以
replace
Null
到NaN
,然后groupby
以agg
和最后reset_index
:
print data_df
country branch Name salary mobile no emailid position
0 x a aa 250000 Null Null unknown
1 x b bb 350000 8976646410 xx@xx.com unknown
2 y c cc 450000 8777945411 yy@yy.com unknown
3 y d dd 589630 Null Null unknown
data_df = data_df.replace('Null', np.nan)
print data_df
country branch Name salary mobile no emailid position
0 x a aa 250000 NaN NaN unknown
1 x b bb 350000 8976646410 xx@xx.com unknown
2 y c cc 450000 8777945411 yy@yy.com unknown
3 y d dd 589630 NaN NaN unknown
df = data_df.groupby(['country', 'branch']).agg({'Name': 'count',
'mobile no':'count',
'emailid': 'count',
'position': 'count'})
print df.reset_index()
country branch emailid position Name mobile no
0 x a 0 1 1 0
1 x b 1 1 1 1
2 y c 1 1 1 1
3 y d 0 1 1 0
EDIT: 编辑:
If you need count positions by category
, create columns
for each category, then groupby
with count
, drop
column salary
and last reset_index
: 如果您需要通过数位
category
,创建columns
每个类别,然后groupby
以count
, drop
柱salary
和最后reset_index
:
print data_df
country branch Name salary mobile no emailid
0 x a aa 250000 Null Null
1 x a aa 20000 Null Null
2 x b bb 350000 8976646410 xx@xx.com
3 y c cc 45000 8777945411 yy@yy.com
4 y d dd 589630 Null Null
normal = data_df['salary'] <= 20000
experienced = (data_df['salary'] > 20000) & (data_df['salary'] <= 50000)
unknown = data_df['salary'] > 50000
data_df.loc[normal, 'position_normal'] = 'normal employee'
data_df.loc[experienced,'position_experienced'] = 'experienced employee'
data_df.loc[unknown,'position_unknown'] = 'unknown employee'
print data_df
country branch Name salary mobile no emailid position_normal \
0 x a aa 250000 Null Null NaN
1 x a aa 20000 Null Null normal employee
2 x b bb 350000 8976646410 xx@xx.com NaN
3 y c cc 45000 8777945411 yy@yy.com NaN
4 y d dd 589630 Null Null NaN
position_experienced position_unknown
0 NaN unknown employee
1 NaN NaN
2 NaN unknown employee
3 experienced employee NaN
4 NaN unknown employee
#replace Null to NaN
data_df = data_df.replace('Null', np.nan)
df = data_df.groupby(['country', 'branch']).count()
#remove column salary
df = df.drop('salary', axis=1)
df = df.reset_index()
print df
country branch Name mobile no emailid position_normal \
0 x a 2 0 0 1
1 x b 1 1 1 0
2 y c 1 1 1 0
3 y d 1 0 0 0
position_experienced position_unknown
0 0 1
1 0 1
2 1 0
3 0 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.