[英]Python splitting strings with prefix
如果我有一個充滿文字和價格的數據框列。
0 £75 BT Reward Card
1 £125 BT Reward Card
2 £50 Retail Voucher
3 £100 BT Reward Card
4 £150 BT Reward Card
5 £50 Cashback
6 Fibre Connection Fee (£50 Credit
7 £75 BT Reward Card
8 £125 BT Reward Card
9 £50 Cashback
10 £0 Fibre Connection Fee (£50 Credit
我只想在英鎊符號后直接返回數字。
到目前為止,我已經掌握了這一點,但對於索引6和10卻有所區別
df['col']=df['col'].apply(lambda x: x.split(' ') [0])
我也嘗試過這個:
df['col']=df['col'].apply(lambda x: x.split('£') [1])
如果需要第一個值,則僅在需要時使用extract
並轉換為整數:
df['new'] = df['col'].str.extract('£(\d+)').astype(int)
print (df)
col new
0 £75 BT Reward Card 75
1 £125 BT Reward Card 125
2 £50 Retail Voucher 50
3 £100 BT Reward Card 100
4 £150 BT Reward Card 150
5 £50 Cashback 50
6 Fibre Connection Fee (£50 Credit 50
7 £75 BT Reward Card 75
8 £125 BT Reward Card 125
9 £50 Cashback 50
10 £0 Fibre Connection Fee (£50 Credit 0
如果列表中的所有值都使用str.findall
:
#values are strings
df['new'] = df['col'].str.findall('£(\d+)')
#values are integers
#df['new'] = df['col'].str.findall('£(\d+)').apply(lambda x: [int(y) for y in x])
print (df)
col new
0 £75 BT Reward Card [75]
1 £125 BT Reward Card [125]
2 £50 Retail Voucher [50]
3 £100 BT Reward Card [100]
4 £150 BT Reward Card [150]
5 £50 Cashback [50]
6 Fibre Connection Fee (£50 Credit [50]
7 £75 BT Reward Card [75]
8 £125 BT Reward Card [125]
9 £50 Cashback [50]
10 £0 Fibre Connection Fee (£50 Credit [0, 50]
如果需要他們在新的列使用extractall
與unstack
, add_prefix
和join
:
df = df.join(df['col'].str.extractall('£(\d+)')[0].unstack().astype(float).add_prefix('new'))
print (df)
col new0 new1
0 £75 BT Reward Card 75.0 NaN
1 £125 BT Reward Card 125.0 NaN
2 £50 Retail Voucher 50.0 NaN
3 £100 BT Reward Card 100.0 NaN
4 £150 BT Reward Card 150.0 NaN
5 £50 Cashback 50.0 NaN
6 Fibre Connection Fee (£50 Credit 50.0 NaN
7 £75 BT Reward Card 75.0 NaN
8 £125 BT Reward Card 125.0 NaN
9 £50 Cashback 50.0 NaN
10 £0 Fibre Connection Fee (£50 Credit 0.0 50.0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.