简体   繁体   English

如何删除数据框中的回车

[英]How to remove carriage return in a dataframe

I am having a dataframe that contains columns named id, country_name, location and total_deaths.我有一个数据框,其中包含名为 id、country_name、location 和 total_deaths 的列。 While doing data cleaning process, I came across a value in a row that has '\\r' attached.在进行数据清理过程时,我在一行中遇到了一个附加了'\\r' Once I complete cleaning process, I store the resulting dataframe in destination.csv file.完成清理过程后,我将生成的数据帧存储在 destination.csv 文件中。 Since the above particular row has \\r attached, it always creates a new row.由于上面的特定行附加了\\r ,它总是会创建一个新行。

id                               29
location            Uttar Pradesh\r
country_name                  India
total_deaths                     20

I want to remove \\r .我想删除\\r I tried df.replace({'\\r': ''}, regex=True) .我试过df.replace({'\\r': ''}, regex=True) It isn't working for me.它对我不起作用。

Is there any other solution.有没有其他解决办法。 Can somebody help?有人可以帮忙吗?

Edit:编辑:

In the above process, I am iterating over df to see if \\r is present.在上述过程中,我遍历 df 以查看是否存在\\r If present, then need to replace.如果存在,则需要更换。 Here row.replace() or row.str.strip() doesn't seem to be working or I could be doing it in a wrong way.此处row.replace()row.str.strip()似乎不起作用,或者我可能以错误的方式进行操作。

I don't want specify the column name or row number while using replace() .我不想在使用replace()时指定列名或行号。 Because I can't be certain that only 'location' column will be having \\r .因为我不能确定只有 'location' 列会有\\r Please find the code below.请在下面找到代码。

count = 0
for row_index, row in df.iterrows():
    if re.search(r"\\r", str(row)):
        print type(row)               #Return type is pandas.Series
        row.replace({r'\\r': ''} , regex=True)
        print row
        count += 1

Another solution is use str.strip :另一种解决方案是使用str.strip

df['29'] = df['29'].str.strip(r'\\r')
print df
             id             29
0      location  Uttar Pradesh
1  country_name          India
2  total_deaths             20

If you want use replace , add r and one \\ :如果要使用replace ,请添加r和一个\\

print df.replace({r'\\r': ''}, regex=True)
             id             29
0      location  Uttar Pradesh
1  country_name          India
2  total_deaths             20

In replace you can define column for replacing like:replace您可以定义用于替换的列,例如:

print df
               id               29
0        location  Uttar Pradesh\r
1    country_name            India
2  total_deaths\r               20

print df.replace({'29': {r'\\r': ''}}, regex=True)
               id             29
0        location  Uttar Pradesh
1    country_name          India
2  total_deaths\r             20

print df.replace({r'\\r': ''}, regex=True)
             id             29
0      location  Uttar Pradesh
1  country_name          India
2  total_deaths             20

EDIT by comment:通过评论编辑:

import pandas as pd

df = pd.read_csv('data_source_test.csv')
print df
   id country_name           location  total_deaths
0   1        India          New Delhi           354
1   2        India         Tamil Nadu            48
2   3        India          Karnataka             0
3   4        India      Andra Pradesh            32
4   5        India              Assam           679
5   6        India             Kerala           128
6   7        India             Punjab             0
7   8        India      Mumbai, Thane             1
8   9        India  Uttar Pradesh\r\n            20
9  10        India             Orissa            69

print df.replace({r'\r\n': ''}, regex=True)
   id country_name       location  total_deaths
0   1        India      New Delhi           354
1   2        India     Tamil Nadu            48
2   3        India      Karnataka             0
3   4        India  Andra Pradesh            32
4   5        India          Assam           679
5   6        India         Kerala           128
6   7        India         Punjab             0
7   8        India  Mumbai, Thane             1
8   9        India  Uttar Pradesh            20
9  10        India         Orissa            69

If need replace only in column location :如果只需要在列location替换:

df['location'] = df.location.str.replace(r'\r\n', '')
print df
   id country_name       location  total_deaths
0   1        India      New Delhi           354
1   2        India     Tamil Nadu            48
2   3        India      Karnataka             0
3   4        India  Andra Pradesh            32
4   5        India          Assam           679
5   6        India         Kerala           128
6   7        India         Punjab             0
7   8        India  Mumbai, Thane             1
8   9        India  Uttar Pradesh            20
9  10        India         Orissa            69

use str.replace , you need to escape the sequence so it treats it as a carriage return rather than the literal \\r :使用str.replace ,您需要对序列进行转义,以便将其视为回车而不是文字\\r

In [15]:
df['29'] = df['29'].str.replace(r'\\r','')
df

Out[15]:
             id             29
0      location  Uttar Pradesh
1  country_name          India
2  total_deaths             20

The below code removes \\n tab spaces, \\n new line and \\r carriage return and is great for condensing datum into one row.下面的代码删除了 \\n 制表符空格、\\n 换行符和 \\r 回车符,非常适合将数据压缩为一行。 The answer was taken from https://gist.github.com/smram/d6ded3c9028272360eb65bcab564a18a答案取自https://gist.github.com/smram/d6ded3c9028272360eb65bcab564a18a

df.replace(to_replace=[r"\\t|\\n|\\r", "\t|\n|\r"], value=["",""], regex=True, inplace=<INPLACE>)

Somehow, the accepted answer did not work for me.不知何故,接受的答案对我不起作用。 Ultimately, I found the solution by doing it like followed最终,我通过如下方式找到了解决方案

df["29"] = df["29"].replace(r'\r', '', regex=True)

The difference is that I use \\r instead of \\\\r .不同之处在于我使用\\r而不是\\\\r

Just make df equal to the df.replace code line and then print df.只需使 df 等于 df.replace 代码行,然后打印 df。

df=df.replace({'\r': ''}, regex=True) 
print(df)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM