简体   繁体   English

用另一列中的值替换字符串的一部分

[英]replacing part of a string with value from another column

A pandas DataFrame contains a column with descriptions and placeholders in curly braces: pandas DataFrame包含一个列,其中包含花括号中的描述和占位符:

descr                        replacement
This: {should be replaced}   with this

The task is to replace the text in the curly braces with text from another column in the same row. 任务是用同一行中另一列的文本替换花括号中的文本。 It's unfortunately not as easy as: 遗憾的是,它不像以下那么容易:

df["descr"] = df["descr"].str.replace(r"{*?}", df["replacement"])

~/anaconda3/lib/python3.6/site-packages/pandas/core/strings.py in replace(self, pat, repl, n, case, flags, regex)
   2532     def replace(self, pat, repl, n=-1, case=None, flags=0, regex=True):
   2533         result = str_replace(self._parent, pat, repl, n=n, case=case,
-> 2534                              flags=flags, regex=regex)
   2535         return self._wrap_result(result)
   2536 

~/anaconda3/lib/python3.6/site-packages/pandas/core/strings.py in str_replace(arr, pat, repl, n, case, flags, regex)
    548     # Check whether repl is valid (GH 13438, GH 15055)
    549     if not (is_string_like(repl) or callable(repl)):
--> 550         raise TypeError("repl must be a string or callable")
    551 
    552     is_compiled_re = is_re(pat)

TypeError: repl must be a string or callable

Use list comprehension with re.sub , especially if performance is important: 将list comprehension与re.sub ,特别是如果性能很重要:

import re

df['new'] = [re.sub(r"{.*?}", b, a) for a, b in zip(df['descr'], df['replacement'])]
print (df)
                        descr replacement              new
0  This: {should be replaced}   with this  This: with this
1                This: {data}         aaa        This: aaa

Your code is using the Pandas.Series.str.replace() and it expects two strings to perform the replacement operation, but the second parameter is a Series. 您的代码使用的是Pandas.Series.str.replace() ,它需要两个字符串来执行替换操作,但第二个参数是Series。

Series.str.replace(pat, repl, n=-1, case=None, flags=0, regex=True)[source] Series.str.replace(pat,repl,n = -1,case = None,flags = 0,regex = True)[来源]

Replace occurrences of pattern/regex in the Series/Index with some other string. 用一些其他字符串替换Series / Index中出现的pattern / regex。 Equivalent to str.replace() or re.sub(). 相当于str.replace()或re.sub()。 Parameters: 参数:

pat : string or compiled regex pat:字符串或编译的正则表达式

repl : string or callable ... repl:string或callable ...

You can correct it using directly the Pandas.Series.replace() method: 您可以直接使用Pandas.Series.replace()方法更正它:

df = pd.DataFrame({'descr': ['This: {should be replaced}'],
                   'replacement': 'with this'
                  })
>> df["descr"].replace(r"{.+?}", df["replacement"], regex = True)
0    This: with this

Observation: 观察:

I changed a bit of your regexp. 我改变了你的正则表达式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM