Python - 通过复制创建新的 DF 列 - 现有列值的部分字符串匹配

Question

I have a dataframe with 50k records with one of the column value like below.我有一个 dataframe 有 50k 条记录，其中一个列值如下所示。

DF\n东风\n

Index.       COLUMN\n

0.       ABC-1M-Deliveryorder
1.       KGF-ORDERDelivery-2Y
2.       DEFGHIABC1M-OPEN
3.       KGFABC
4.       ABC-3Y-ORDER

I am looking for key words - 3Y , 3M , 2Y and 1Y from COLUMN and if found the values need to be copied to a new DF column name TENOR with 3Y , 3M , 1M etc. In case not found it can show FALSE or NAN我正在从COLUMN中寻找关键字 - 3Y 、 3M 、 2Y和1Y ，如果找到这些值，则需要使用3Y 、 3M 、 1M等将这些值复制到新的 DF 列名称TENOR中。如果找不到它可以显示FALSE或NAN

I tried with below code我试过下面的代码

df['Tenor'] = ""\n

df['Tenor'] = df.column.apply(lambda x: x in ['3Y','3M,'1Y','1M']

This returns as FALSE in all rows for the new column.这在新列的所有行中都返回为FALSE 。 Can you please advise what is best way to meet my requirement?你能告诉我什么是满足我要求的最好方法吗？

Answer 1

You can use pandas.Series.str.contains with a regex:您可以将pandas.Series.str.contains与正则表达式一起使用：

import pandas as pd

df = pd.DataFrame(dict(
    COLUMN = [
        'ABC-1M-Deliveryorder','KGF-ORDERDelivery-2Y',
        'DEFGHIABC1M-OPEN', 'KGFABC', 'ABC-3Y-ORDER'
    ]
))

df['Tenor'] = df['COLUMN'].str.contains('3Y|3M|2Y|1Y|1M', regex=True)

Edit: OP asked the follow up question:编辑：OP问了后续问题：

The above code snippet is returning TRUE wherever the column finds the string 2Y, 3Y etc.. But i need the output as below Index Column NEW 0 ABC-1M-Deliveryorder 1M 1 KGF-ORDERDelivery-2Y 2Y 2 DEFGHIABC1M-OPEN 1M 3 KGFABC Nan 4 ABC-3Y-ORDER 3Y上面的代码片段在列找到字符串 2Y、3Y 等的地方返回 TRUE。但我需要 output 如下索引列 NEW 0 ABC-1M-Deliveryorder 1M 1 KGF-ORDERDelivery-2Y 2Y 2 DEFGHIABC1M-OPEN 1M 3 KGFABC Nan 4 ABC-3Y-ORDER 3Y

If that is the case then you may want to use a custom function and pandas.Series.apply like so:如果是这种情况，那么您可能希望像这样使用自定义 function 和pandas.Series.apply ：

import pandas as pd

df = pd.DataFrame(dict(
    COLUMN = [
        'ABC-1M-Deliveryorder','KGF-ORDERDelivery-2Y',
        'DEFGHIABC1M-OPEN', 'KGFABC', 'ABC-3Y-ORDER'
    ]
))

def find_substring(x):
    for y in ('3Y','3M','2Y','1Y','1M'):
        if y in x:
            return y

df['Tenor'] = df['COLUMN'].apply(find_substring)

print(df)

output: output：

                 COLUMN Tenor
0  ABC-1M-Deliveryorder    1M
1  KGF-ORDERDelivery-2Y    2Y
2      DEFGHIABC1M-OPEN    1M
3                KGFABC  None
4          ABC-3Y-ORDER    3Y

python tutor link to example python 导师链接到示例

Answer 2

The above code snippet is returning TRUE wherever the column finds the string 2Y, 3Y etc..上面的代码片段在列找到字符串 2Y、3Y 等的任何地方都返回 TRUE。

But i need the output as below code output但我需要 output 如下代码 output

'''' ''''

Index Column NEW 0 ABC-1M-Deliveryorder 1M 1 KGF-ORDERDelivery-2Y 2Y 2 DEFGHIABC1M-OPEN 1M 3 KGFABC Nan 4 ABC-3Y-ORDER 3Y索引栏 NEW 0 ABC-1M-Deliveryorder 1M 1 KGF-ORDERDelivery-2Y 2Y 2 DEFGHIABC1M-OPEN 1M 3 KGFABC Nan 4 ABC-3Y-ORDER 3Y

'''' ''''

Python - 通过复制创建新的 DF 列 - 现有列值的部分字符串匹配

问题描述

2 个解决方案

解决方案1
1 2020-04-29 14:19:34

解决方案2
0 2020-04-30 05:16:07

Python - 通过复制创建新的 DF 列 - 现有列值的部分字符串匹配

问题描述

2 个解决方案

解决方案1 1 2020-04-29 14:19:34

解决方案2 0 2020-04-30 05:16:07

解决方案1
1 2020-04-29 14:19:34

解决方案2
0 2020-04-30 05:16:07