![](/img/trans.png)
[英]Is there a way to filter columns in Python using multiple data types in condition?
[英]Python data frame apply filter on multiple columns with same condition?
这是我的熊猫数据框。
new_data =
name duration01 duration02 orz01 orz02
ABC 1 years 6 months 5 months Nan Google
XYZ 4 months 3 years 2 months Google Zensar
TYZ 4 months 4 years Google In Google
OPI 2 months 3 months Nan accenture
NRM 9 months 3 years Google Zensar
我想找出在Google工作的雇员的姓名,工期以月为单位。这里的值包含在多个列中吗? 如何对多列应用过滤器?
duration01 => orz01(员工在orz01中花了多少个月/年)duration02 => orz02(员工在orz02中花了多少个月/年)
总共有10个orz和10个相应的工期列。
我尝试下面的代码
# Selected the required columns
orz_cols = new_data.columns[new_data.columns.str.contains('orz')]
new_data [ new_data[orz_cols].apply(lambda x: x.str.contains('Google')) ]
但是它不能打印适当的数据吗?
我该如何实现
我想要像下面的输出
name Total_duration_in Google_in_Months
ABC 5 months
XYZ 4 months
TYZ 52 months
使用第一部分@Stefan给我的以下部分将年份转换为几个月
# filter the data
Google_Data = dt1[dt1['orz'].str.contains('Google')]
dur = []
for i in range(0,len(Google_Data['duration'])):
dur.append(Google_Data['duration'][i].split())
months_list = []
for i in range(0,len(dur)):
#print dur[i]
if dur[i][1] == 'years':
if len(dur[i]) > 2:
val1 = int(dur[i][0]) * 12 + int(dur[i][2])
val11 = str(val1)+" months"
months_list.append(val11)
else:
val2 = int(dur[i][0]) * 12
val22 = str(val2)+" months"
months_list.append(val22)
else:
val3 = dur[i][0]+" months"
months_list.append(val3)
months_list[:3]
# Concat
df2 = pd.DataFrame(months_list,index=Google_Data.index.copy())
Google_duration = pd.concat([Google_Data, df2], axis=1)
Output :
organization Duration_In_Months
name
Aparna Arora Google Headstrong Capital Markets 60 months
Aparna Dasgupta Google 24 months
Aparna Dhar Google India Ltd 56 months
现在,我要执行最后一步,即通过对名称进行分组来求和,但是这里的“名称”是索引。 我正在努力获得总和。
这是我正在尝试的
# Splitting the Duration_In_Months to get only number values
# Its returning the type as 'str'
Google_duration1 = Google_duration.Duration_In_Months.apply(lambda x : x.split()[0])
# apply groupby
Genpact_dur2.index.groupby(Genpact_dur2['Duration_In_Months'])
我如何Groupby索引并取总和?
谢谢,
您可以执行以下操作:
设置index
并获取要合并的columns
:
df.set_index('name', inplace=True)
orz_cols = [col for col in df.columns if col.startswith('orz')]
duration_cols = [col for col in df.columns if col.startswith('duration')]
merge_cols = zip(orz_cols, duration_cols)
使用pd.concat()
重塑和重命名:
long_df = pd.concat([df.loc[:, cols].rename(columns={col: col[:-2] for col in orz_cols + duration_cols}) for cols in merge_cols])
消除非Google orz
条目:
long_df = long_df[long_df.orz.str.contains('Google')]
根据month
和year
计算duration
:
long_df.duration = long_df.duration.str.split().apply(lambda x: int(x[0]) if x[1] == 'months' else int(x[0]) * 12)
name
总和:
long_df.groupby(level='name')['duration'].sum()
duration
name
ABC 5
NRM 9
TYZ 52
XYZ 4
考虑使用pandas.melt进行重塑,然后使用np.where()
有条件地解析出数年np.where()
月的值。 最后,由Google组织汇总。
import pandas as pd
import numpy as np
...
# LIST OF SUBSET COLUMNS
durationCols = [c for c in df.columns if 'duration' in c ]
orzCols = [c for c in df.columns if 'orz' in c ]
# MELT AND MERGE
df = pd.merge(pd.melt(df, id_vars=['name'], value_vars=durationCols,
var_name=None, value_name='duration'),
pd.melt(df, id_vars=['name'], value_vars=orzCols,
var_name=None, value_name='orz'),
right_index=True, left_index=True, on=['name'])[['name', 'duration', 'orz']]
# DURATION CONDITIONAL CALCULATION (YEAR + MONTH)
df['actual_dur'] = np.where(df['duration'].str.contains('year'),
df['duration'].str[:1], 0).astype(int) * 12 + \
np.where(df['duration'].str.contains('year.*month'),
df['duration'].str[8:9],
np.where(df['duration'].str.contains('month'),
df['duration'].str[:1], 0)).astype(int)
df['orz'] = np.where(df['orz']\
.str.contains('Google'), 'Google', df['orz'])
# SUM DURATION AND OUTPUT DF
df = df[df['orz']=='Google'].groupby(['name','orz']).sum().reset_index()
df = df[['name','actual_dur']]
df.columns = ['name', 'Total_duration_in Google_in_Months']
输出量
# name Total_duration_in Google_in_Months
# 0 ABC 5
# 1 NRM 9
# 2 TYZ 52
# 3 XYZ 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.