[英]Pandas remove all of a string in a column after a character
So I have a data set with over 500 rows where one of the columns has values like this: 所以我有一个超过500行的数据集,其中一列的值如下所示:
df: DF:
column1
0 a{'...'}
1 b{'...'}
2 c{'...'}
3 d{'...'}
I want to remove everything within and including the {}
. 我想删除
{}
内的所有内容。
I have been looking at this question, Pandas delete parts of string after specified character inside a dataframe and tried the solutions there but I keep getting errors(And I am aware that StringIO
is now io.StringIO
). 我一直在看这个问题, Pandas删除数据帧中指定字符后的字符串部分并尝试了解决方案,但我一直收到错误(我知道
StringIO
现在是io.StringIO
)。
I've tried 我试过了
df.column1 = df.column1.str.split('{')[0]
but get the error message: KeyError: 0
and don't really understand what that means 但得到错误消息:
KeyError: 0
并不真正理解这意味着什么
I've also tried: 我也尝试过:
df.column1 = df.column1.str.split(pat='{')
But this only seems deletes the '{' so I'm left with 但这似乎只删除了'{'所以我留下了
column1
0 a'...'}
1 b'...'}
2 c'...'}
3 d'...'}
Also I'm not sure if it's important but the column is an object
type. 此外,我不确定它是否重要但列是
object
类型。 Can anyone tell me what I'm doing wrong and how to fix the issue??? 任何人都可以告诉我我做错了什么以及如何解决问题???
You can using replace
你可以使用
replace
df['column1'].str.replace(r"\{.*\}","")
Out[385]:
0 a
1 b
2 c
3 d
Name: column1, dtype: object
You can also use pandas.DataFrame.replace
and pass a dictionary that specifies what to do for various columns. 您还可以使用
pandas.DataFrame.replace
并传递一个字典,指定对各种列执行的操作。
Using @Wen's regex pattern 使用@Wen的正则表达式模式
df.replace(dict(column1={'\{.*\}': ''}), regex=True)
column1
0 a
1 b
2 c
3 d
In the spirit of @pault, you can also use pandas.Series.str.extract
本着@pault的精神,你也可以使用
pandas.Series.str.extract
df.column1.str.extract('([^\{]+)', expand=False)
column1
0 a
1 b
2 c
3 d
A little late (@Wen's solution is great), but you can use pandas.Series.str.split()
as in your original attempt. 有点晚了(@ Wen的解决方案很棒),但您可以像原始尝试一样使用
pandas.Series.str.split()
。 You were close- you just need to set expand=True
. 你很亲密 - 你只需要设置
expand=True
。
df["column1"] = df["column1"].str.split("{", expand=True)[0]
# column1
#0 a
#1 b
#2 c
#3 d
Using .apply
使用
.apply
df = pd.DataFrame({"a":["a{'...'}", "b{'...'}"]})
df["a"] = df["a"].apply(lambda x: x.split('{')[0])
print df
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.