[英]extracting a value from a column
I have a column that includes several data which are separated by hyphens.我有一列包含几个由连字符分隔的数据。 For instance,例如,
column A
TTT-Changing Car-BBBB-KKKK
TTT-KKKK - Changing device-KKKK
Releasing device-RRRR-KKKK-TTTT
RRRR-BBBB-Switching Car-TTTT
Login issue -RRRR-KKKK-TTTT
CCCC-Activation issue-RRRR-KKKK-TTTT
I got a list of words that I want to look up from column A into column B. Giving an example if column A contains "Changing" or "change" or "a change" it returns "Change" in column B and if it contains "Activation" or "registration" returns "Activation" in column B etc...我有一个单词列表,我想从 A 列查找到 B 列。举个例子,如果 A 列包含“Changing”或“change”或“a change”,它会在 B 列中返回“Change”,如果它包含“激活”或“注册”在 B 列等中返回“激活”...
I'm looking for something similar to [if(isnumber(search( formula in excel ] but can be used in python.我正在寻找类似于 [if(isnumber(search( 公式 in excel ] 但可以在 python 中使用。
Thanks,谢谢,
You could use the extract
function:您可以使用extract
function:
df['column B'] = df['column A'].str.extract('(Changing[^-]*)')
df
column A column B
0 TTT-Changing Car-BBBB-KKKK Changing Car
1 TTT-KKKK - Changing device-KKKK Changing device
2 Releasing device-RRRR-KKKK-TTTT NaN
3 RRRR-BBBB-Switching Car-TTTT NaN
4 Login issue -RRRR-KKKK-TTTT NaN
5 CCCC-Activation issue-RRRR-KKKK-TTTT NaN
If you want to replace the contents, consider using a dictionary:如果要替换内容,请考虑使用字典:
dct = {'changing': 'Change',
'change':'Change',
'activation':'Activation',
'registration':'Activation'}
pat = f"(?i).*\\b({'|'.join(dct.keys())})\\b.*"
df['column A'].str.replace(pat, lambda x: dct.get(x.group(1).lower(), None))
0 Change
1 Change
2 Releasing device-RRRR-KKKK-TTTT
3 RRRR-BBBB-Switching Car-TTTT
4 Login issue -RRRR-KKKK-TTTT
5 Activation
Name: column A, dtype: object
If i understood it correctly below should work:如果我理解正确,下面应该可以工作:
df
column A
0 TTT-Changing Car-BBBB-KKKK
1 TTT-KKKK - Changing device-KKKK
2 Releasing device-RRRR-KKKK-TTTT
3 RRRR-BBBB-Switching Car-TTTT
4 Login issue -RRRR-KKKK-TTTT
5 CCCC-Activation issue-RRRR-KKKK-TTTT
Using str.extract
for both Activation
& Changing
strings.使用str.extract
Activation
和Changing
字符串。
df['column B'] = df['column A'].str.extract('(Activation|Changing[^-]*)')
column A column B
0 TTT-Changing Car-BBBB-KKKK Changing Car
1 TTT-KKKK - Changing device-KKKK Changing device
2 Releasing device-RRRR-KKKK-TTTT NaN
3 RRRR-BBBB-Switching Car-TTTT NaN
4 Login issue -RRRR-KKKK-TTTT NaN
5 CCCC-Activation issue-RRRR-KKKK-TTTT Activation
Now replace the words as you desired in the new column ie colB
现在根据需要在新列中替换单词,即colB
df['column B'] = df['column B'].str.replace(r'(^.*Changing.*$)', 'Change')
df['column B'] = df['column B'].str.replace(r'(^.*Activation.*$)', 'Activation')
df
column A column B
0 TTT-Changing Car-BBBB-KKKK Change
1 TTT-KKKK - Changing device-KKKK Change
2 Releasing device-RRRR-KKKK-TTTT NaN
3 RRRR-BBBB-Switching Car-TTTT NaN
4 Login issue -RRRR-KKKK-TTTT NaN
5 CCCC-Activation issue-RRRR-KKKK-TTTT Activation
Better way below you can arrange number of items that you want to rename and then apply to the Dataframe like below:下面的更好方法是您可以安排要重命名的项目数量,然后应用于 Dataframe,如下所示:
df = pd.read_csv("data_file")
df['column B'] = df['column A'].str.extract('(Activation|Changing[^-]*)')
replacements = {
'column B': {
r'(^.*Changing.*$)': 'Change',
r'(^.*Activation.*$)': 'Activation'}
}
df = df.replace(replacements, regex=True)
print(df)
column A column B
0 TTT-Changing Car-BBBB-KKKK Change
1 TTT-KKKK - Changing device-KKKK Change
2 Releasing device-RRRR-KKKK-TTTT NaN
3 RRRR-BBBB-Switching Car-TTTT NaN
4 Login issue -RRRR-KKKK-TTTT NaN
5 CCCC-Activation issue-RRRR-KKKK-TTTT Activation
OR或者
here we are not defining the column name within replacement hence you need to do define df['column B'] =
在这里,我们没有在替换中定义列名,因此您需要定义df['column B'] =
df['column B'] = df['column A'].str.extract('(Activation|Changing[^-]*)')
replacements = {
r'(^.*Changing.*$)': 'Change',
r'(^.*Activation.*$)': 'Activation'
}
print(replacements)
df['column B'] = df['column B'].replace(replacements, regex=True)
print(df)
replacement
is comparatively slow, while column-wise operation is fast enough. replacement
相对较慢,而按列操作则足够快。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.