[英]Python - Create a new DF column by copying - partial string match from existing column values
I have a dataframe with 50k records with one of the column value like below.我有一个 dataframe 有 50k 条记录,其中一个列值如下所示。
DF\n东风\n
Index. COLUMN\n
0. ABC-1M-Deliveryorder
1. KGF-ORDERDelivery-2Y
2. DEFGHIABC1M-OPEN
3. KGFABC
4. ABC-3Y-ORDER
I am looking for key words - 3Y
, 3M
, 2Y
and 1Y
from COLUMN
and if found the values need to be copied to a new DF column name TENOR
with 3Y
, 3M
, 1M
etc. In case not found it can show FALSE
or NAN
我正在从COLUMN
中寻找关键字 - 3Y
、 3M
、 2Y
和1Y
,如果找到这些值,则需要使用3Y
、 3M
、 1M
等将这些值复制到新的 DF 列名称TENOR
中。如果找不到它可以显示FALSE
或NAN
I tried with below code我试过下面的代码
df['Tenor'] = ""\n
df['Tenor'] = df.column.apply(lambda x: x in ['3Y','3M,'1Y','1M']
This returns as FALSE
in all rows for the new column.这在新列的所有行中都返回为FALSE
。 Can you please advise what is best way to meet my requirement?你能告诉我什么是满足我要求的最好方法吗?
You can use pandas.Series.str.contains with a regex:您可以将pandas.Series.str.contains与正则表达式一起使用:
import pandas as pd
df = pd.DataFrame(dict(
COLUMN = [
'ABC-1M-Deliveryorder','KGF-ORDERDelivery-2Y',
'DEFGHIABC1M-OPEN', 'KGFABC', 'ABC-3Y-ORDER'
]
))
df['Tenor'] = df['COLUMN'].str.contains('3Y|3M|2Y|1Y|1M', regex=True)
Edit: OP asked the follow up question:编辑:OP问了后续问题:
The above code snippet is returning TRUE wherever the column finds the string 2Y, 3Y etc.. But i need the output as below Index Column NEW 0 ABC-1M-Deliveryorder 1M 1 KGF-ORDERDelivery-2Y 2Y 2 DEFGHIABC1M-OPEN 1M 3 KGFABC Nan 4 ABC-3Y-ORDER 3Y上面的代码片段在列找到字符串 2Y、3Y 等的地方返回 TRUE。但我需要 output 如下索引列 NEW 0 ABC-1M-Deliveryorder 1M 1 KGF-ORDERDelivery-2Y 2Y 2 DEFGHIABC1M-OPEN 1M 3 KGFABC Nan 4 ABC-3Y-ORDER 3Y
If that is the case then you may want to use a custom function and pandas.Series.apply
like so:如果是这种情况,那么您可能希望像这样使用自定义 function 和pandas.Series.apply
:
import pandas as pd
df = pd.DataFrame(dict(
COLUMN = [
'ABC-1M-Deliveryorder','KGF-ORDERDelivery-2Y',
'DEFGHIABC1M-OPEN', 'KGFABC', 'ABC-3Y-ORDER'
]
))
def find_substring(x):
for y in ('3Y','3M','2Y','1Y','1M'):
if y in x:
return y
df['Tenor'] = df['COLUMN'].apply(find_substring)
print(df)
output: output:
COLUMN Tenor
0 ABC-1M-Deliveryorder 1M
1 KGF-ORDERDelivery-2Y 2Y
2 DEFGHIABC1M-OPEN 1M
3 KGFABC None
4 ABC-3Y-ORDER 3Y
The above code snippet is returning TRUE wherever the column finds the string 2Y, 3Y etc..上面的代码片段在列找到字符串 2Y、3Y 等的任何地方都返回 TRUE。
But i need the output as below code output但我需要 output 如下代码 output
'''' ''''
Index Column NEW 0 ABC-1M-Deliveryorder 1M 1 KGF-ORDERDelivery-2Y 2Y 2 DEFGHIABC1M-OPEN 1M 3 KGFABC Nan 4 ABC-3Y-ORDER 3Y索引栏 NEW 0 ABC-1M-Deliveryorder 1M 1 KGF-ORDERDelivery-2Y 2Y 2 DEFGHIABC1M-OPEN 1M 3 KGFABC Nan 4 ABC-3Y-ORDER 3Y
'''' ''''
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.