从列中提取值

Question

I have a column that includes several data which are separated by hyphens.我有一列包含几个由连字符分隔的数据。 For instance,例如，

column A
TTT-Changing Car-BBBB-KKKK
TTT-KKKK - Changing device-KKKK
Releasing device-RRRR-KKKK-TTTT
RRRR-BBBB-Switching Car-TTTT
Login issue -RRRR-KKKK-TTTT
CCCC-Activation issue-RRRR-KKKK-TTTT

I got a list of words that I want to look up from column A into column B. Giving an example if column A contains "Changing" or "change" or "a change" it returns "Change" in column B and if it contains "Activation" or "registration" returns "Activation" in column B etc...我有一个单词列表，我想从 A 列查找到 B 列。举个例子，如果 A 列包含“Changing”或“change”或“a change”，它会在 B 列中返回“Change”，如果它包含“激活”或“注册”在 B 列等中返回“激活”...

I'm looking for something similar to [if(isnumber(search( formula in excel ] but can be used in python.我正在寻找类似于 [if(isnumber(search( 公式 in excel ] 但可以在 python 中使用。

Thanks,谢谢，

Answer 1

You could use the extract function:您可以使用extract function：

df['column B'] = df['column A'].str.extract('(Changing[^-]*)')

df
                               column A         column B
0            TTT-Changing Car-BBBB-KKKK     Changing Car
1       TTT-KKKK - Changing device-KKKK  Changing device
2       Releasing device-RRRR-KKKK-TTTT              NaN
3          RRRR-BBBB-Switching Car-TTTT              NaN
4           Login issue -RRRR-KKKK-TTTT              NaN
5  CCCC-Activation issue-RRRR-KKKK-TTTT              NaN

EDIT编辑

If you want to replace the contents, consider using a dictionary:如果要替换内容，请考虑使用字典：

dct = {'changing': 'Change',
       'change':'Change',
       'activation':'Activation',
       'registration':'Activation'}

pat = f"(?i).*\\b({'|'.join(dct.keys())})\\b.*"

df['column A'].str.replace(pat, lambda x: dct.get(x.group(1).lower(), None))
0                             Change
1                             Change
2    Releasing device-RRRR-KKKK-TTTT
3       RRRR-BBBB-Switching Car-TTTT
4        Login issue -RRRR-KKKK-TTTT
5                         Activation
Name: column A, dtype: object

Answer 2

If i understood it correctly below should work:如果我理解正确，下面应该可以工作：

DataFrame: DataFrame：

df
                               column A
0            TTT-Changing Car-BBBB-KKKK
1       TTT-KKKK - Changing device-KKKK
2       Releasing device-RRRR-KKKK-TTTT
3          RRRR-BBBB-Switching Car-TTTT
4           Login issue -RRRR-KKKK-TTTT
5  CCCC-Activation issue-RRRR-KKKK-TTTT

Using str.extract for both Activation & Changing strings.使用str.extract Activation和Changing字符串。

df['column B'] = df['column A'].str.extract('(Activation|Changing[^-]*)')

                                   column A     column B
0            TTT-Changing Car-BBBB-KKKK     Changing Car
1       TTT-KKKK - Changing device-KKKK  Changing device
2       Releasing device-RRRR-KKKK-TTTT              NaN
3          RRRR-BBBB-Switching Car-TTTT              NaN
4           Login issue -RRRR-KKKK-TTTT              NaN
5  CCCC-Activation issue-RRRR-KKKK-TTTT       Activation

Now replace the words as you desired in the new column ie colB现在根据需要在新列中替换单词，即colB

df['column B']  = df['column B'].str.replace(r'(^.*Changing.*$)', 'Change')
df['column B']  = df['column B'].str.replace(r'(^.*Activation.*$)', 'Activation')

df
                               column A      column B
0            TTT-Changing Car-BBBB-KKKK      Change
1       TTT-KKKK - Changing device-KKKK      Change
2       Releasing device-RRRR-KKKK-TTTT         NaN
3          RRRR-BBBB-Switching Car-TTTT         NaN
4           Login issue -RRRR-KKKK-TTTT         NaN
5  CCCC-Activation issue-RRRR-KKKK-TTTT  Activation

Another Way around:另一种方法：

Better way below you can arrange number of items that you want to rename and then apply to the Dataframe like below:下面的更好方法是您可以安排要重命名的项目数量，然后应用于 Dataframe，如下所示：

df = pd.read_csv("data_file")
df['column B'] = df['column A'].str.extract('(Activation|Changing[^-]*)')

replacements = {
   'column B': {
      r'(^.*Changing.*$)': 'Change',
      r'(^.*Activation.*$)': 'Activation'}
}

df = df.replace(replacements, regex=True)
print(df)

Result:结果：

                               column A    column B
0            TTT-Changing Car-BBBB-KKKK      Change
1       TTT-KKKK - Changing device-KKKK      Change
2       Releasing device-RRRR-KKKK-TTTT         NaN
3          RRRR-BBBB-Switching Car-TTTT         NaN
4           Login issue -RRRR-KKKK-TTTT         NaN
5  CCCC-Activation issue-RRRR-KKKK-TTTT  Activation

OR或者

here we are not defining the column name within replacement hence you need to do define df['column B'] =在这里，我们没有在替换中定义列名，因此您需要定义df['column B'] =

df['column B'] = df['column A'].str.extract('(Activation|Changing[^-]*)')
replacements = {
      r'(^.*Changing.*$)': 'Change',
      r'(^.*Activation.*$)': 'Activation'
}
print(replacements)
df['column B'] = df['column B'].replace(replacements, regex=True)
print(df)

Note:笔记：

replacement is comparatively slow, while column-wise operation is fast enough. replacement相对较慢，而按列操作则足够快。

从列中提取值

问题描述

2 个解决方案

解决方案1
0 2021-06-15 22:36:01

EDIT编辑

解决方案2
0 已采纳 2021-06-16 06:13:04

DataFrame: DataFrame：

Another Way around:另一种方法：

Result:结果：

Note:笔记：

从列中提取值

问题描述

2 个解决方案

解决方案1 0 2021-06-15 22:36:01

EDIT编辑

解决方案2 0 已采纳 2021-06-16 06:13:04

DataFrame: DataFrame：

Another Way around:另一种方法：

Result:结果：

Note:笔记：

解决方案1
0 2021-06-15 22:36:01

解决方案2
0 已采纳 2021-06-16 06:13:04