繁体   English   中英

在Pandas中,如何检查三个组合的字符串列是否== 10个字符,如果是,则插入新列?

[英]In Pandas, how do I check if three combined string columns == 10 characters, and if so, insert into new column?

我想将三个Pandas字符串列合并为一个新列,如果新列的总合并字符等于10。

如果不等于10,则检查下一个合并的列。

如果Phone1Area等于3个字符串字符,并且Phone1Prefix等于3个字符串字符,并且Phone1NumberPart等于4个字符串字符,换句话说,总共十个字符,我已经尝试将这些列一起添加到新列中。 我尝试添加列(如果它们是3 + 3 + 4个字符),df.loc和其他更多内容。

这是数据集的示例:

数据集

这是代码:

dfp['p1'] = df[(df['Phone1Area'].str.len() == 3.0)]['Phone1Area'] + 
df[(df['Phone1Exchange'].str.len() == 3.0)]['Phone1Exchange'] + 
df[(df['Phone1NumberPart'].str.len() == 4.0)]['Phone1NumberPart']


dfp['p2'] = df[(df['Phone2Area'].str.len() == 3.0)]['Phone2Area'] + 
df[(df['Phone2Exchange'].str.len() == 3.0)]['Phone2Exchange'] + 
df[(df['Phone2NumberPart'].str.len() == 4.0)]['Phone2NumberPart']

df_phone.loc[df_phone['p1'].str.len() == 10, 'phone'] = df_phone['p1']
df_phone.loc[df_phone['p2'].str.len() == 10, 'phone'] = df_phone['p2']

这是我想要的操作,但是在Pandas中:

if df_phone['p1'].str.len() == 10:
    then insert df_phone['p1'] into df_phone['phone']
elif df_phone['p2'].str.len() == 10:
    then insert df_phone['p2'] into df_phone['phone']
elif df_phone['p3'].str.len() == 10:
    then insert df_phone['p3'] into df_phone['phone']

我希望phone栏包含电话1的10个字符,如果不是10个字符,则phone栏包含电话2的10个字符,依此类推。

但是结果之一是:

AttributeError: 'DataFrame' object has no attribute 'str'

任何想法如何解决这个问题?

这应该有助于:

df['phone'] = ''
df['test_phone'] = df['phone1Area'] + df['phone1Exchange'] + df['phone1NumberPart']
df['phone'][df['test_phone'].str.len() == 10] = df['test_phone']
df['test_phone'] = df['phone2Area'] + df['phone2Exchange'] + df['phone2NumberPart']
df['phone'][(df['test_phone'].str.len() == 10) & (df['phone'] == '')] = df['test_phone']
df['test_phone'] = df['phone3Area'] + df['phone3Exchange'] + df['phone3NumberPart']
df['phone'][(df['test_phone'].str.len() == 10) & (df['phone'] == '')] = df['test_phone']
etc.

使用np.select另一种解决方案应该更快:

conditions = [df_phone['p1'].str.len() == 10, df_phone['p2'].str.len() == 10,\
              df_phone['p3'].str.len() == 10]
choices = [df_phone['p1'], df_phone['p2'], df_phone['p3']]

df_phone['phone'] = np.select(conditions, choices, default = '')

文档:

  • np.select :选择条件中遇到的第一个True值的选择。 如果仅为False ,则使用default填充。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM