繁体   English   中英

如何提取特定编号来自 dataframe 列 Python 的字符

[英]How to extract specific no. of character from dataframe column Python

就像我们在 Excel 中使用的左公式一样,我想提取编号。 保单号的特征基于 Insurer 列的列,例如.....

如果保险公司是 HDFC,则仅从 sting 中提取 10 个字符,如果保险公司是 tata,则仅从 sting 中提取 7 个字符。

我将如何在 python 中实现这一点

保险公司 保单号 预期 OutPut
高清晰度电视 4509242332 4509242332
塔塔 tatadigit National 塔塔迪格
高清晰度电视 09082323ab12sd 09082323ab
高清晰度电视 诺兰英雄 诺兰英雄
塔塔 97543007356 9754300
塔塔 pqrsequence2o202 pqrsequ
塔塔 987654321 9876543

您可以做的是定义一个新的 function 进行比较并将其应用于新列(在您的示例中为Expected OutPut )。

def f(row): 
    val = str(row['Policy no.'])
    return val[:10] if row['Insurer'] == "Hdfc" else val[:7]

df['Expected OutPut'] = df.apply(f, axis=1)

你可以试试np.select

df['out'] = np.select(
    [df['Insurer'].str.lower().eq('hdfc'),
     df['Insurer'].str.lower().eq('tata')],
    [df['Policy no.'].str[:10],
     df['Policy no.'].str[:7],],
    df['Policy no.']
    )
print(df)

  Insurer         Policy no. Expected OutPut         out
0    Hdfc         4509242332      4509242332  4509242332
1    Tata  tatadigitNational         tatadig     tatadig
2    Hdfc     09082323ab12sd      09082323ab  09082323ab
3    Hdfc       nolanheroman      nolanherom  nolanherom
4    Tata        97543007356         9754300     9754300
5    Tata   pqrsequence2o202         pqrsequ     pqrsequ
6    Tata          987654321         9876543     9876543

一种可能的解决方案是,

df['temp'] = df['Insurer'].map({'Hdfc':10, 'Tata':7})
df['Expected Output'] = df.apply(lambda x: x['Policy no.'][:x['temp']], axis=1)

输出/输出:

  Insurer         Policy no. Expected Output  temp
0    Hdfc         4509242332      4509242332    10
1    Tata  tatadigitNational         tatadig     7
2    Hdfc     09082323ab12sd      09082323ab    10
3    Hdfc       nolanheroman      nolanherom    10
4    Tata        97543007356         9754300     7
5    Tata   pqrsequence2o202         pqrsequ     7
6    Tata          987654321         9876543     7

另一种解决方案:

df['Expected OutPut'] = df.apply(lambda x: x['Policy no.'][0:10] if x['Insurer']=='Hdfc' else x['Policy no.'][0:7], axis = 1)
print(df)

Insurer         Policy no. Expected OutPut
0    Hdfc         4509242332      4509242332
1    Tata  tatadigitNational         tatadig
2    Hdfc     09082323ab12sd      09082323ab
3    Hdfc       nolanheroman      nolanherom
4    Tata        97543007356         9754300
5    Tata   pqrsequence2o202         pqrsequ
6    Tata          987654321         9876543

使用pandas slicing的向量解决方案

df['Expected Output'] = df['Policy no.'].str[:10]
df.loc[df.index[df.Insurer.eq('Tata')], 'Expected Output'] = df['Expected Output'].loc[df.index[df.Insurer.eq('Tata')]].str[:7]

这给了我们预期的 output:

df
  
  Insurer         Policy no. Expected Output
0    Hdfc         4509242332      4509242332
1    Tata  tatadigitNational         tatadig
2    Hdfc     09082323ab12sd      09082323ab
3    Hdfc       nolanheroman      nolanherom
4    Tata        97543007356         9754300
5    Tata   pqrsequence2o202         pqrsequ
6    Tata          987654321         9876543

你可以试试这个


df['Expected Output'] = np.where(df['Insurer']== 'Hdfc', df["Policy no"].str[:10],df["Policy no"].str[:7])

您可以通过将 dataframe 转换为数组,然后遍历每一行来轻松做到这一点

array = df.to_numpy()

for row in array:
    #assuming you only have 2 columns, check to see if the insurer is Tata
    if row[0] == 'Tata':
        #slice string in the Policy column
        row[1] = row[1][:7]
#now, convert array back to df
pd.Dataframe(array, columns=['Insurer','Policy no.'])
    

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM