如何在特定条件下将两列合并到第三列

Question

I am rather new to Pandas and I struggle to solve this problem:我对 Pandas 很陌生，我很难解决这个问题：

I have a DataFrame with doctors' activities.我有一个带有医生活动的 DataFrame。

pd0.info()                                                                                                                                                                                                 
<class 'pandas.core.frame.DataFrame'>
Int64Index: 14059 entries, 0 to 4418
Data columns (total 22 columns):
dossier               14059 non-null object
code_praticien        14059 non-null object
nom_praticien         14059 non-null object
code_anesthesiste     13128 non-null object
nom_anesthesiste      13128 non-null object
patient               14059 non-null object
sexe_patient          14059 non-null object
date_naiss_patient    14059 non-null datetime64[ns]
date                  14059 non-null datetime64[ns]
heure                 13842 non-null float64
ccam_ngap_diag        13852 non-null object
libelle               14059 non-null object
association           7682 non-null float64
modificateur1         11340 non-null object
modificateur2         1262 non-null object
modificateur3         8 non-null float64
modificateur4         0 non-null float64
montant_ccam          13684 non-null float64
montant_ngap          207 non-null float64
depassement           14049 non-null float64
total                 13901 non-null float64
praticien             13128 non-null object
dtypes: datetime64[ns](2), float64(8), object(12)
memory usage: 2.8+ MB

Two columns contain the surgeon code ('code_praticien') and the anesthesiologist code ('code_anesthesiste'):两列包含外科医生代码（'code_praticien'）和麻醉师代码（'code_anesthesiste'）：

test = pd0[['code_praticien', 'code_anesthesiste']]
test                                                                                                                                                                                                       
Out[65]: 
     code_praticien code_anesthesiste
0            BENY00            MORA01
1            BENY00            MORA01
2            BENY00            MORA01
3            BENY00            MORA01
4            BENY00            MORA01
...             ...               ...
4414         GAUD00            SAVO01
4415         SAVO01            SAVO01
4416         GAUD00            SAVO01
4417         GAUD00            SAVO01
4418         SAVO01            SAVO01

[14059 rows x 2 columns]

I am trying to deal with the case where the "surgeon" IS the anesthesiologist (eg: pain control procedures).我正在尝试处理“外科医生”是麻醉师的情况（例如：疼痛控制程序）。 In that case, we have 'code_anesthesiste' NaN and 'code_praticien' which is one of the anesthesiologists codes.在这种情况下，我们有 'code_anesthesiste' NaN 和 'code_praticien' 这是麻醉师代码之一。 I created a new column 'anesthesiste' which will contain either the 'code_anesthesiste' when not null, or 'code_praticien' when 'code_anesthesiste' isnull() and 'code_praticien' isin([List of valid code_anesthesiste]).我创建了一个新列“麻醉剂”，当不是 null 时，将包含“code_anesthesiste”，或者当“code_anesthesiste”为空（）和“code_praticien”isin（[有效代码麻醉剂列表]）时包含“code_praticien”。

test['anesthesiste'] = test.code_anesthesiste
test.loc[test.code_anesthesiste.isnull() & test.code_praticien.isin(['MORA01', 'SAVO01'])].anesthesiste = pd0.code_praticien

But I keep getting this error: "ValueError: cannot reindex from a duplicate axis" I googled about 'duplicate axis' but can't understand where is my mistake...但我不断收到此错误：“ValueError：无法从重复轴重新索引”我在谷歌上搜索了“重复轴”，但不明白我的错误在哪里......

I had a look at the fillna() function, but it doesn't seem adequate as I don't want to have surgeons' codes in the 'anesthesiste' column, (sometimes surgeon works without anesthesiologist, then I have 'code_anesthesiste' NaN, but 'code_praticien' is not an anesthesiolgist's code).我查看了 fillna() function，但这似乎还不够，因为我不想在“麻醉剂”列中有外科医生的代码，（有时外科医生在没有麻醉师的情况下工作，然后我有 'code_anesthesiste' NaN ，但“code_praticien”不是麻醉师的代码）。

Thanks for your help.谢谢你的帮助。

Answer 1

you can use a simple apply here:你可以在这里使用一个简单的应用：

df = pd.DataFrame({'code_practicien':['BENYY00','BENY00','GAUD00','SAVO01'],'code_anesthesiste':['MORA01','MORA01',np.NaN,'SAVO01']})
df['anethesite']=df.apply(lambda row: row['code_practicien'] if (pd.isnull(row['code_anesthesiste'])&(row['code_practicien'] in ['GAUD00','test'])) else row['code_anesthesiste'],axis=1)
df

Replace ['GAUD00','test'] by your current list of valid anesthesiste将['GAUD00','test']替换为您当前的有效麻醉剂列表

如何在特定条件下将两列合并到第三列

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-11-04 15:50:13

如何在特定条件下将两列合并到第三列

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-11-04 15:50:13

解决方案1
1 已采纳 2019-11-04 15:50:13