[英]need to split dataframe columns using comma delimited
I have a data frame column-like,我有一个类似列的数据框,
comments
misha,park@gmail.com,233432
ammesh,,3545657
",,,"
neta,ne34@gmail.com,,
I want to split using comma, when two comma occurs conituosly need to fill that column as NA.我想使用逗号分割,当出现两个逗号时,需要将该列填充为 NA。 When three comma occurs need to fill all the three columns as NA (like in the third row)当出现三个逗号时需要将所有三列都填充为 NA(如第三行)
EXPECTED OUTPUT :
comments name mail phone
misha,park@gmail.com,233432 misha park@gmail.com 233432
ammesh,,3545657 ammesh NA 3545657
",,," NA NA NA
neta,ne34@gmail.com,, neta ne34@gmail.com NA
CODE USED:使用的代码:
b = a.join(a['comments'].str.split(',', expand=True).add_prefix('comments')).fillna(np.nan)
In case you will not find something more pythonic, the following code should work properly.如果你找不到更多 Pythonic 的东西,下面的代码应该可以正常工作。 I tried to cover all scenarios of ',,' appearance:我试图涵盖 ',,' 出现的所有场景:
a['name']=''
a['mail']=''
a['phone']=''
for i in range(len(a)):
if ',,' not in a.comments.iloc[i] and ',,,' not in a.comments.iloc[i]:
s=a.comments.iloc[i].split(',')
a['name'].iloc[i]=s[0]
a['mail'].iloc[i]=s[1]
a['phone'].iloc[i]=s[2]
elif ',,,' in a.comments.iloc[i]:
a['name'].iloc[i]=np.nan
a['mail'].iloc[i]=np.nan
a['phone'].iloc[i]=np.nan
else:
s=a.comments.iloc[i].split(',')
if len(s)==5:
a['name'].iloc[i]=np.nan
a['mail'].iloc[i]=s[2]
a['phone'].iloc[i]=np.nan
if len(s)==4:
if s[0]=='':
a['name'].iloc[i]=np.nan
a['mail'].iloc[i]=s[2]
a['phone'].iloc[i]=s[3]
elif s[-1]=='':
a['name'].iloc[i]=s[0]
a['mail'].iloc[i]=s[1]
a['phone'].iloc[i]=np.nan
if len(s)==3:
a['name'].iloc[i]=s[0]
a['mail'].iloc[i]=np.nan
a['phone'].iloc[i]=s[2]
print(a)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.