Pandas删除字符后的列中的所有字符串

Question

So I have a data set with over 500 rows where one of the columns has values like this: 所以我有一个超过500行的数据集，其中一列的值如下所示：

df: DF：

         column1

 0    a{'...'}  
 1    b{'...'}
 2    c{'...'}  
 3    d{'...'}

I want to remove everything within and including the {} . 我想删除{}内的所有内容。

I have been looking at this question, Pandas delete parts of string after specified character inside a dataframe and tried the solutions there but I keep getting errors(And I am aware that StringIO is now io.StringIO ). 我一直在看这个问题， Pandas删除数据帧中指定字符后的字符串部分并尝试了解决方案，但我一直收到错误（我知道StringIO现在是io.StringIO ）。

I've tried 我试过了

df.column1 = df.column1.str.split('{')[0]

but get the error message: KeyError: 0 and don't really understand what that means 但得到错误消息： KeyError: 0并不真正理解这意味着什么

I've also tried: 我也尝试过：

df.column1 = df.column1.str.split(pat='{')

But this only seems deletes the '{' so I'm left with 但这似乎只删除了'{'所以我留下了

      column1

 0    a'...'}   
 1    b'...'}
 2    c'...'}   
 3    d'...'}

Also I'm not sure if it's important but the column is an object type. 此外，我不确定它是否重要但列是object类型。 Can anyone tell me what I'm doing wrong and how to fix the issue??? 任何人都可以告诉我我做错了什么以及如何解决问题???

Answer 1

You can using replace 你可以使用replace

df['column1'].str.replace(r"\{.*\}","")
Out[385]: 
0    a
1    b
2    c
3    d
Name: column1, dtype: object

Answer 2

You can also use pandas.DataFrame.replace and pass a dictionary that specifies what to do for various columns. 您还可以使用pandas.DataFrame.replace并传递一个字典，指定对各种列执行的操作。

Using @Wen's regex pattern 使用@Wen的正则表达式模式

df.replace(dict(column1={'\{.*\}': ''}), regex=True)

  column1
0       a
1       b
2       c
3       d

In the spirit of @pault, you can also use pandas.Series.str.extract 本着@pault的精神，你也可以使用pandas.Series.str.extract

df.column1.str.extract('([^\{]+)', expand=False)

  column1
0       a
1       b
2       c
3       d

Answer 3

A little late (@Wen's solution is great), but you can use pandas.Series.str.split() as in your original attempt. 有点晚了（@ Wen的解决方案很棒），但您可以像原始尝试一样使用pandas.Series.str.split() 。 You were close- you just need to set expand=True . 你很亲密 - 你只需要设置expand=True 。

df["column1"] = df["column1"].str.split("{", expand=True)[0]
#  column1
#0       a
#1       b
#2       c
#3       d

Answer 4

Using .apply 使用.apply

df = pd.DataFrame({"a":["a{'...'}", "b{'...'}"]})
df["a"] = df["a"].apply(lambda x: x.split('{')[0])
print df

Pandas删除字符后的列中的所有字符串

问题描述

4 个解决方案

解决方案1
5 已采纳 2018-04-13 15:22:37

解决方案2
3 2018-04-13 16:24:52

解决方案3
2 2018-04-13 15:46:52

解决方案4
0 2018-04-13 15:26:28

Pandas删除字符后的列中的所有字符串

问题描述

4 个解决方案

解决方案1 5 已采纳 2018-04-13 15:22:37

解决方案2 3 2018-04-13 16:24:52

解决方案3 2 2018-04-13 15:46:52

解决方案4 0 2018-04-13 15:26:28

解决方案1
5 已采纳 2018-04-13 15:22:37

解决方案2
3 2018-04-13 16:24:52

解决方案3
2 2018-04-13 15:46:52

解决方案4
0 2018-04-13 15:26:28