[英]How to extract specific no. of character from dataframe column Python
Like Left formula we Used in Excel same as I want to extract no.就像我们在 Excel 中使用的左公式一样,我想提取编号。 of character from Policy no.保单号的特征column based on Insurer column like.....基于 Insurer 列的列,例如.....
if insurer is HDFC then Extract only 10 character form the sting and if insurer is tata then Extract only 7 character form the sting.如果保险公司是 HDFC,则仅从 sting 中提取 10 个字符,如果保险公司是 tata,则仅从 sting 中提取 7 个字符。
How I will achieve this in python我将如何在 python 中实现这一点
Insurer保险公司 | Policy no.保单号 | Expected OutPut预期 OutPut |
---|---|---|
Hdfc高清晰度电视 | 4509242332 4509242332 | 4509242332 4509242332 |
Tata塔塔 | tatadigitNational tatadigit National | tatadig塔塔迪格 |
Hdfc高清晰度电视 | 09082323ab12sd 09082323ab12sd | 09082323ab 09082323ab |
Hdfc高清晰度电视 | nolanheroman诺兰英雄 | nolanherom诺兰英雄 |
Tata塔塔 | 97543007356 97543007356 | 9754300 9754300 |
Tata塔塔 | pqrsequence2o202 pqrsequence2o202 | pqrsequ pqrsequ |
Tata塔塔 | 987654321 987654321 | 9876543 9876543 |
What you can do is define a new function that does the comparison and apply that to a new column ( Expected OutPut
in your example).您可以做的是定义一个新的 function 进行比较并将其应用于新列(在您的示例中为Expected OutPut
)。
def f(row):
val = str(row['Policy no.'])
return val[:10] if row['Insurer'] == "Hdfc" else val[:7]
df['Expected OutPut'] = df.apply(f, axis=1)
You can try np.select
你可以试试np.select
df['out'] = np.select(
[df['Insurer'].str.lower().eq('hdfc'),
df['Insurer'].str.lower().eq('tata')],
[df['Policy no.'].str[:10],
df['Policy no.'].str[:7],],
df['Policy no.']
)
print(df)
Insurer Policy no. Expected OutPut out
0 Hdfc 4509242332 4509242332 4509242332
1 Tata tatadigitNational tatadig tatadig
2 Hdfc 09082323ab12sd 09082323ab 09082323ab
3 Hdfc nolanheroman nolanherom nolanherom
4 Tata 97543007356 9754300 9754300
5 Tata pqrsequence2o202 pqrsequ pqrsequ
6 Tata 987654321 9876543 9876543
One possible solution is,一种可能的解决方案是,
df['temp'] = df['Insurer'].map({'Hdfc':10, 'Tata':7})
df['Expected Output'] = df.apply(lambda x: x['Policy no.'][:x['temp']], axis=1)
O/P:输出/输出:
Insurer Policy no. Expected Output temp
0 Hdfc 4509242332 4509242332 10
1 Tata tatadigitNational tatadig 7
2 Hdfc 09082323ab12sd 09082323ab 10
3 Hdfc nolanheroman nolanherom 10
4 Tata 97543007356 9754300 7
5 Tata pqrsequence2o202 pqrsequ 7
6 Tata 987654321 9876543 7
Another solution:另一种解决方案:
df['Expected OutPut'] = df.apply(lambda x: x['Policy no.'][0:10] if x['Insurer']=='Hdfc' else x['Policy no.'][0:7], axis = 1)
print(df)
Insurer Policy no. Expected OutPut
0 Hdfc 4509242332 4509242332
1 Tata tatadigitNational tatadig
2 Hdfc 09082323ab12sd 09082323ab
3 Hdfc nolanheroman nolanherom
4 Tata 97543007356 9754300
5 Tata pqrsequence2o202 pqrsequ
6 Tata 987654321 9876543
A vector solution using pandas slicing
使用pandas slicing
的向量解决方案
df['Expected Output'] = df['Policy no.'].str[:10]
df.loc[df.index[df.Insurer.eq('Tata')], 'Expected Output'] = df['Expected Output'].loc[df.index[df.Insurer.eq('Tata')]].str[:7]
which gives us the expected output:这给了我们预期的 output:
df
Insurer Policy no. Expected Output
0 Hdfc 4509242332 4509242332
1 Tata tatadigitNational tatadig
2 Hdfc 09082323ab12sd 09082323ab
3 Hdfc nolanheroman nolanherom
4 Tata 97543007356 9754300
5 Tata pqrsequence2o202 pqrsequ
6 Tata 987654321 9876543
You could try this你可以试试这个
df['Expected Output'] = np.where(df['Insurer']== 'Hdfc', df["Policy no"].str[:10],df["Policy no"].str[:7])
You can easily do this by converting the dataframe to an array, then iterating through every row您可以通过将 dataframe 转换为数组,然后遍历每一行来轻松做到这一点
array = df.to_numpy()
for row in array:
#assuming you only have 2 columns, check to see if the insurer is Tata
if row[0] == 'Tata':
#slice string in the Policy column
row[1] = row[1][:7]
#now, convert array back to df
pd.Dataframe(array, columns=['Insurer','Policy no.'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.