[英]How to extract specific columns without index no. and with all the rows in python dataframe?
[英]How to extract specific no. of character from dataframe column Python
就像我们在 Excel 中使用的左公式一样,我想提取编号。 保单号的特征基于 Insurer 列的列,例如.....
如果保险公司是 HDFC,则仅从 sting 中提取 10 个字符,如果保险公司是 tata,则仅从 sting 中提取 7 个字符。
我将如何在 python 中实现这一点
保险公司 | 保单号 | 预期 OutPut |
---|---|---|
高清晰度电视 | 4509242332 | 4509242332 |
塔塔 | tatadigit National | 塔塔迪格 |
高清晰度电视 | 09082323ab12sd | 09082323ab |
高清晰度电视 | 诺兰英雄 | 诺兰英雄 |
塔塔 | 97543007356 | 9754300 |
塔塔 | pqrsequence2o202 | pqrsequ |
塔塔 | 987654321 | 9876543 |
您可以做的是定义一个新的 function 进行比较并将其应用于新列(在您的示例中为Expected OutPut
)。
def f(row):
val = str(row['Policy no.'])
return val[:10] if row['Insurer'] == "Hdfc" else val[:7]
df['Expected OutPut'] = df.apply(f, axis=1)
你可以试试np.select
df['out'] = np.select(
[df['Insurer'].str.lower().eq('hdfc'),
df['Insurer'].str.lower().eq('tata')],
[df['Policy no.'].str[:10],
df['Policy no.'].str[:7],],
df['Policy no.']
)
print(df)
Insurer Policy no. Expected OutPut out
0 Hdfc 4509242332 4509242332 4509242332
1 Tata tatadigitNational tatadig tatadig
2 Hdfc 09082323ab12sd 09082323ab 09082323ab
3 Hdfc nolanheroman nolanherom nolanherom
4 Tata 97543007356 9754300 9754300
5 Tata pqrsequence2o202 pqrsequ pqrsequ
6 Tata 987654321 9876543 9876543
一种可能的解决方案是,
df['temp'] = df['Insurer'].map({'Hdfc':10, 'Tata':7})
df['Expected Output'] = df.apply(lambda x: x['Policy no.'][:x['temp']], axis=1)
输出/输出:
Insurer Policy no. Expected Output temp
0 Hdfc 4509242332 4509242332 10
1 Tata tatadigitNational tatadig 7
2 Hdfc 09082323ab12sd 09082323ab 10
3 Hdfc nolanheroman nolanherom 10
4 Tata 97543007356 9754300 7
5 Tata pqrsequence2o202 pqrsequ 7
6 Tata 987654321 9876543 7
另一种解决方案:
df['Expected OutPut'] = df.apply(lambda x: x['Policy no.'][0:10] if x['Insurer']=='Hdfc' else x['Policy no.'][0:7], axis = 1)
print(df)
Insurer Policy no. Expected OutPut
0 Hdfc 4509242332 4509242332
1 Tata tatadigitNational tatadig
2 Hdfc 09082323ab12sd 09082323ab
3 Hdfc nolanheroman nolanherom
4 Tata 97543007356 9754300
5 Tata pqrsequence2o202 pqrsequ
6 Tata 987654321 9876543
使用pandas slicing
的向量解决方案
df['Expected Output'] = df['Policy no.'].str[:10]
df.loc[df.index[df.Insurer.eq('Tata')], 'Expected Output'] = df['Expected Output'].loc[df.index[df.Insurer.eq('Tata')]].str[:7]
这给了我们预期的 output:
df
Insurer Policy no. Expected Output
0 Hdfc 4509242332 4509242332
1 Tata tatadigitNational tatadig
2 Hdfc 09082323ab12sd 09082323ab
3 Hdfc nolanheroman nolanherom
4 Tata 97543007356 9754300
5 Tata pqrsequence2o202 pqrsequ
6 Tata 987654321 9876543
你可以试试这个
df['Expected Output'] = np.where(df['Insurer']== 'Hdfc', df["Policy no"].str[:10],df["Policy no"].str[:7])
您可以通过将 dataframe 转换为数组,然后遍历每一行来轻松做到这一点
array = df.to_numpy()
for row in array:
#assuming you only have 2 columns, check to see if the insurer is Tata
if row[0] == 'Tata':
#slice string in the Policy column
row[1] = row[1][:7]
#now, convert array back to df
pd.Dataframe(array, columns=['Insurer','Policy no.'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.