[英]How to remove carriage return in a dataframe
I am having a dataframe that contains columns named id, country_name, location and total_deaths.我有一个数据框,其中包含名为 id、country_name、location 和 total_deaths 的列。 While doing data cleaning process, I came across a value in a row that has '\\r'
attached.在进行数据清理过程时,我在一行中遇到了一个附加了'\\r'
。 Once I complete cleaning process, I store the resulting dataframe in destination.csv file.完成清理过程后,我将生成的数据帧存储在 destination.csv 文件中。 Since the above particular row has \\r
attached, it always creates a new row.由于上面的特定行附加了\\r
,它总是会创建一个新行。
id 29
location Uttar Pradesh\r
country_name India
total_deaths 20
I want to remove \\r
.我想删除\\r
。 I tried df.replace({'\\r': ''}, regex=True)
.我试过df.replace({'\\r': ''}, regex=True)
。 It isn't working for me.它对我不起作用。
Is there any other solution.有没有其他解决办法。 Can somebody help?有人可以帮忙吗?
In the above process, I am iterating over df to see if \\r
is present.在上述过程中,我遍历 df 以查看是否存在\\r
。 If present, then need to replace.如果存在,则需要更换。 Here row.replace()
or row.str.strip()
doesn't seem to be working or I could be doing it in a wrong way.此处row.replace()
或row.str.strip()
似乎不起作用,或者我可能以错误的方式进行操作。
I don't want specify the column name or row number while using replace()
.我不想在使用replace()
时指定列名或行号。 Because I can't be certain that only 'location' column will be having \\r
.因为我不能确定只有 'location' 列会有\\r
。 Please find the code below.请在下面找到代码。
count = 0
for row_index, row in df.iterrows():
if re.search(r"\\r", str(row)):
print type(row) #Return type is pandas.Series
row.replace({r'\\r': ''} , regex=True)
print row
count += 1
Another solution is use str.strip
:另一种解决方案是使用str.strip
:
df['29'] = df['29'].str.strip(r'\\r')
print df
id 29
0 location Uttar Pradesh
1 country_name India
2 total_deaths 20
If you want use replace
, add r
and one \\
:如果要使用replace
,请添加r
和一个\\
:
print df.replace({r'\\r': ''}, regex=True)
id 29
0 location Uttar Pradesh
1 country_name India
2 total_deaths 20
In replace
you can define column for replacing like:在replace
您可以定义用于替换的列,例如:
print df
id 29
0 location Uttar Pradesh\r
1 country_name India
2 total_deaths\r 20
print df.replace({'29': {r'\\r': ''}}, regex=True)
id 29
0 location Uttar Pradesh
1 country_name India
2 total_deaths\r 20
print df.replace({r'\\r': ''}, regex=True)
id 29
0 location Uttar Pradesh
1 country_name India
2 total_deaths 20
EDIT by comment:通过评论编辑:
import pandas as pd
df = pd.read_csv('data_source_test.csv')
print df
id country_name location total_deaths
0 1 India New Delhi 354
1 2 India Tamil Nadu 48
2 3 India Karnataka 0
3 4 India Andra Pradesh 32
4 5 India Assam 679
5 6 India Kerala 128
6 7 India Punjab 0
7 8 India Mumbai, Thane 1
8 9 India Uttar Pradesh\r\n 20
9 10 India Orissa 69
print df.replace({r'\r\n': ''}, regex=True)
id country_name location total_deaths
0 1 India New Delhi 354
1 2 India Tamil Nadu 48
2 3 India Karnataka 0
3 4 India Andra Pradesh 32
4 5 India Assam 679
5 6 India Kerala 128
6 7 India Punjab 0
7 8 India Mumbai, Thane 1
8 9 India Uttar Pradesh 20
9 10 India Orissa 69
If need replace only in column location
:如果只需要在列location
替换:
df['location'] = df.location.str.replace(r'\r\n', '')
print df
id country_name location total_deaths
0 1 India New Delhi 354
1 2 India Tamil Nadu 48
2 3 India Karnataka 0
3 4 India Andra Pradesh 32
4 5 India Assam 679
5 6 India Kerala 128
6 7 India Punjab 0
7 8 India Mumbai, Thane 1
8 9 India Uttar Pradesh 20
9 10 India Orissa 69
use str.replace
, you need to escape the sequence so it treats it as a carriage return rather than the literal \\r
:使用str.replace
,您需要对序列进行转义,以便将其视为回车而不是文字\\r
:
In [15]:
df['29'] = df['29'].str.replace(r'\\r','')
df
Out[15]:
id 29
0 location Uttar Pradesh
1 country_name India
2 total_deaths 20
The below code removes \\n tab spaces, \\n new line and \\r carriage return and is great for condensing datum into one row.下面的代码删除了 \\n 制表符空格、\\n 换行符和 \\r 回车符,非常适合将数据压缩为一行。 The answer was taken from https://gist.github.com/smram/d6ded3c9028272360eb65bcab564a18a答案取自https://gist.github.com/smram/d6ded3c9028272360eb65bcab564a18a
df.replace(to_replace=[r"\\t|\\n|\\r", "\t|\n|\r"], value=["",""], regex=True, inplace=<INPLACE>)
Somehow, the accepted answer did not work for me.不知何故,接受的答案对我不起作用。 Ultimately, I found the solution by doing it like followed最终,我通过如下方式找到了解决方案
df["29"] = df["29"].replace(r'\r', '', regex=True)
The difference is that I use \\r
instead of \\\\r
.不同之处在于我使用\\r
而不是\\\\r
。
Just make df equal to the df.replace code line and then print df.只需使 df 等于 df.replace 代码行,然后打印 df。
df=df.replace({'\r': ''}, regex=True)
print(df)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.