在熊猫中使用lstrip时删除多余的字符

Question

我有一个数据列，如下所示

输入：

  CD

  Component Description_CAP YO
  Component Description_CAPE IO
  Component Description_CLOSE SO
  Component Description_CAT TO
  Component Description_CAPP TTO
  Component Description_CLOSE IUO

我使用过lstrip，其中Component_Description之后的“ C”被删除了，这是错误的

      df['CD'] = df['CD'].map(lambda x: x.lstrip('Component Description_'))

预期结果：

  CD

  CAP YO
  CLOSE SO
  CAT TO
  CAPP TTO
  CLOSE IUO

我得到的实际结果

       CD

       AP YO
       LOSE SO
       AT TO
       APP TTO
       LOSE IU

Answer 1

如果使用lstrip则问题出在解决方案中，它会从左侧删除字符串中定义的所有字母。

解决方案是将带有^ Series.str.replace用于正则表达式中的开始ot字符串：

df['CD'] = df['CD'].str.replace(r'^Component Description_', '')
print (df)
          CD
0     CAP YO
1    CAPE IO
2   CLOSE SO
3     CAT TO
4   CAPP TTO
5  CLOSE IUO

Answer 2

使用str.extract

例如：

df = pd.DataFrame({"CD": ['Component Description_CAP YO', 'Component Description_CAPE IO', 'Component Description_CLOSE SO', 'Component Description_CAT TO', 'Component Description_CAPP TTO', 'Component Description_CLOSE IUO']})
df["CD"] = df["CD"].str.extract(r"_(.*)$")
print(df)

输出：

          CD
0     CAP YO
1    CAPE IO
2   CLOSE SO
3     CAT TO
4   CAPP TTO
5  CLOSE IUO

在熊猫中使用lstrip时删除多余的字符

问题描述

2 个解决方案

解决方案1
1 2019-08-06 12:34:39

解决方案2
1 2019-08-06 12:37:30

在熊猫中使用lstrip时删除多余的字符

问题描述

2 个解决方案

解决方案1 1 2019-08-06 12:34:39

解决方案2 1 2019-08-06 12:37:30

解决方案1
1 2019-08-06 12:34:39

解决方案2
1 2019-08-06 12:37:30