简体   繁体   English

从列中提取值

[英]extracting a value from a column

I have a column that includes several data which are separated by hyphens.我有一列包含几个由连字符分隔的数据。 For instance,例如,

column A
TTT-Changing Car-BBBB-KKKK
TTT-KKKK - Changing device-KKKK
Releasing device-RRRR-KKKK-TTTT
RRRR-BBBB-Switching Car-TTTT
Login issue -RRRR-KKKK-TTTT
CCCC-Activation issue-RRRR-KKKK-TTTT

I got a list of words that I want to look up from column A into column B. Giving an example if column A contains "Changing" or "change" or "a change" it returns "Change" in column B and if it contains "Activation" or "registration" returns "Activation" in column B etc...我有一个单词列表,我想从 A 列查找到 B 列。举个例子,如果 A 列包含“Changing”或“change”或“a change”,它会在 B 列中返回“Change”,如果它包含“激活”或“注册”在 B 列等中返回“激活”...

I'm looking for something similar to [if(isnumber(search( formula in excel ] but can be used in python.我正在寻找类似于 [if(isnumber(search( 公式 in excel ] 但可以在 python 中使用。

Thanks,谢谢,

You could use the extract function:您可以使用extract function:

df['column B'] = df['column A'].str.extract('(Changing[^-]*)')

df
                               column A         column B
0            TTT-Changing Car-BBBB-KKKK     Changing Car
1       TTT-KKKK - Changing device-KKKK  Changing device
2       Releasing device-RRRR-KKKK-TTTT              NaN
3          RRRR-BBBB-Switching Car-TTTT              NaN
4           Login issue -RRRR-KKKK-TTTT              NaN
5  CCCC-Activation issue-RRRR-KKKK-TTTT              NaN

EDIT编辑

If you want to replace the contents, consider using a dictionary:如果要替换内容,请考虑使用字典:

dct = {'changing': 'Change',
       'change':'Change',
       'activation':'Activation',
       'registration':'Activation'}

pat = f"(?i).*\\b({'|'.join(dct.keys())})\\b.*"

df['column A'].str.replace(pat, lambda x: dct.get(x.group(1).lower(), None))
0                             Change
1                             Change
2    Releasing device-RRRR-KKKK-TTTT
3       RRRR-BBBB-Switching Car-TTTT
4        Login issue -RRRR-KKKK-TTTT
5                         Activation
Name: column A, dtype: object

If i understood it correctly below should work:如果我理解正确,下面应该可以工作:

DataFrame: DataFrame:

df
                               column A
0            TTT-Changing Car-BBBB-KKKK
1       TTT-KKKK - Changing device-KKKK
2       Releasing device-RRRR-KKKK-TTTT
3          RRRR-BBBB-Switching Car-TTTT
4           Login issue -RRRR-KKKK-TTTT
5  CCCC-Activation issue-RRRR-KKKK-TTTT

Using str.extract for both Activation & Changing strings.使用str.extract ActivationChanging字符串。

df['column B'] = df['column A'].str.extract('(Activation|Changing[^-]*)')

                                   column A     column B
0            TTT-Changing Car-BBBB-KKKK     Changing Car
1       TTT-KKKK - Changing device-KKKK  Changing device
2       Releasing device-RRRR-KKKK-TTTT              NaN
3          RRRR-BBBB-Switching Car-TTTT              NaN
4           Login issue -RRRR-KKKK-TTTT              NaN
5  CCCC-Activation issue-RRRR-KKKK-TTTT       Activation

Now replace the words as you desired in the new column ie colB现在根据需要在新列中替换单词,即colB

df['column B']  = df['column B'].str.replace(r'(^.*Changing.*$)', 'Change')
df['column B']  = df['column B'].str.replace(r'(^.*Activation.*$)', 'Activation')

df
                               column A      column B
0            TTT-Changing Car-BBBB-KKKK      Change
1       TTT-KKKK - Changing device-KKKK      Change
2       Releasing device-RRRR-KKKK-TTTT         NaN
3          RRRR-BBBB-Switching Car-TTTT         NaN
4           Login issue -RRRR-KKKK-TTTT         NaN
5  CCCC-Activation issue-RRRR-KKKK-TTTT  Activation

Another Way around:另一种方法:

Better way below you can arrange number of items that you want to rename and then apply to the Dataframe like below:下面的更好方法是您可以安排要重命名的项目数量,然后应用于 Dataframe,如下所示:

df = pd.read_csv("data_file")
df['column B'] = df['column A'].str.extract('(Activation|Changing[^-]*)')

replacements = {
   'column B': {
      r'(^.*Changing.*$)': 'Change',
      r'(^.*Activation.*$)': 'Activation'}
}

df = df.replace(replacements, regex=True)
print(df)

Result:结果:

                               column A    column B
0            TTT-Changing Car-BBBB-KKKK      Change
1       TTT-KKKK - Changing device-KKKK      Change
2       Releasing device-RRRR-KKKK-TTTT         NaN
3          RRRR-BBBB-Switching Car-TTTT         NaN
4           Login issue -RRRR-KKKK-TTTT         NaN
5  CCCC-Activation issue-RRRR-KKKK-TTTT  Activation

OR或者

here we are not defining the column name within replacement hence you need to do define df['column B'] =在这里,我们没有在替换中定义列名,因此您需要定义df['column B'] =

df['column B'] = df['column A'].str.extract('(Activation|Changing[^-]*)')
replacements = {
      r'(^.*Changing.*$)': 'Change',
      r'(^.*Activation.*$)': 'Activation'
}
print(replacements)
df['column B'] = df['column B'].replace(replacements, regex=True)
print(df)

Note:笔记:

replacement is comparatively slow, while column-wise operation is fast enough. replacement相对较慢,而按列操作则足够快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM