[英]Remove string from all items of dataframe with Pandas
I have a dataframe df
like so: 我有一个像这样的数据帧df
:
dic = {'A':['pap','cdf\nsdc','ert','dgx','kll\nsrw','sdq'],
'B':[1,4,6,2,5,6],
'C':['123\n12','34','55','321\n88','09','45']}
df = pd.DataFrame(dic)
My goal is to remove from all columns the string formed by \\n
and whatever if precedes it: abc\\ndef
---> def
我的目标是从所有列中删除由\\n
形成的字符串以及在它之前的任何内容: abc\\ndef
---> def
I was able to achieve my goal by using the following lines of code: 我能够通过使用以下代码行来实现我的目标:
for index,row in df.iterrows():
df['A'][index]=row['A'].split('\n')[-1]
df['C'][index]=row['C'].split('\n')[-1]
However I would like to have a smarter and more compact way to achieve such result. 但是,我希望有一种更智能,更紧凑的方式来实现这样的结果。 Can you suggest a more elegant way than mine (some oneliner maybe)? 你能否建议一种比我更优雅的方式(也许一些oneliner)?
Note : column B
is float! 注意 : B
列是浮动的!
You can use vectorised str.split
on the cols in question, if you have a more complicated example then you'd need to filter the cols of interest based on dtype: 你可以在cols上使用str.split
,如果你有一个更复杂的例子,那么你需要根据dtype过滤感兴趣的cols:
In [135]:
df['A'] = df['A'].str.split('\n').str[-1]
df['C'] = df['C'].str.split('\n').str[-1]
df
Out[135]:
A B C
0 pap 1 12
1 sdc 4 34
2 ert 6 55
3 dgx 2 88
4 srw 5 09
5 sdq 6 45
A dynamic method: 动态方法:
In [142]:
str_cols = df.select_dtypes([np.object]).columns
str_cols
Out[142]:
Index(['A', 'C'], dtype='object')
In [143]:
for col in str_cols:
df[col] = df[col].str.split('\n').str[-1]
df
Out[143]:
A B C
0 pap 1 12
1 sdc 4 34
2 ert 6 55
3 dgx 2 88
4 srw 5 09
5 sdq 6 45
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.