[英]How to extract specific information from pandas column?
This is the posdf
:这是
posdf
:
tradingsymbol
0 XYZ2061820500PE
1 XYZ20JUN21000PE
2 ABC20JUN100CE
3 ABC20JUN102.5PE
4 ABC20JUN92.5PE
4 XYZ20JUNFUT
I am doing this to extract the ABC
and XYZ
to a column:我这样做是为了将
ABC
和XYZ
提取到列中:
posdf['symbol'] = posdf['tradingsymbol'].str.extract('^(\D+)', expand=True)
I cannot figure out how I can make a generalised way to extract the following columns:我无法弄清楚如何使用通用方法来提取以下列:
strike type Expiry
0 20500 PE 20618
1 21000 PE 20JUN
2 100 CE 20JUN
3 102.5 PE 20JUN
4 92.5 PE 20JUN
4 NA FUT 20JUN
type
is min 2 chars max 3. Expiry
is always 5 chars. type
是 min 2 chars max 3. Expiry
总是 5 chars。 Which could possibly have this form: 20O18
or 20N18
or 20D18
.这可能有这种形式:
20O18
或20N18
或20D18
。
Adding rows where type
can be 3 chars based on Sammy's comment.根据 Sammy 的评论,添加
type
可以是 3 个字符的行。
Use, Series.str.extract
with a given regex
pattern:使用具有给定
regex
模式的Series.str.extract
:
df1 = df['tradingsymbol'].str.extract(
r'(?P<expiry>\d{5}|\d{2}\w{3})(?P<strike>\d+(?:\.\d+)?)?(?P<type>\w+)')
df1 = df1[['strike', 'type', 'expiry']]
Result:结果:
# print(df1)
strike type expiry
0 20500 PE 20618
1 21000 PE 20JUN
2 100 CE 20JUN
3 102.5 PE 20JUN
4 92.5 PE 20JUN
4 NaN FUT 20JUN
if Strike is always numerical then you can do:如果 Strike 总是数字那么你可以这样做:
posdf[['Symbol','Expiry','Strike','Type']] = posdf['tradingsymbol'].str.extract('^(\D+)(.{5})([0-9.]*)([a-zA-Z]{2,3})', expand=True)
Result:结果:
tradingsymbol Symbol Expiry Strike Type
0 XYZ2061820500PE XYZ 20618 20500 PE
1 XYZ20JUN21000PE XYZ 20JUN 21000 PE
2 ABC20JUN100CE ABC 20JUN 100 CE
3 ABC20JUN102.5PE ABC 20JUN 102.5 PE
4 ABC20JUN92.5PE ABC 20JUN 92.5 PE
4 XYZ20JUNFUT XYZ 20JUN FUT
Bit of a hack:有点黑客:
res = (df.assign(Expiry = df.tradingsymbol.str[3:8],
type = df.tradingsymbol.str[8:].str.split("([a-zA-Z]+)").str[1],
strike = df.tradingsymbol.str[8:].str.split("[a-zA-Z]+").str[0],
)
)
res
tradingsymbol Expiry type strike
0 XYZ2061820500PE 20618 PE 20500
1 XYZ20JUN21000PE 20JUN PE 21000
2 ABC20JUN100CE 20JUN CE 100
3 ABC20JUN102.5PE 20JUN PE 102.5
4 ABC20JUN92.5PE 20JUN PE 92.5
4 XYZ20JUNFUT 20JUN FUT
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.