Pandas python 用字符串替换空行

Question

I have a csv which at some point becomes like this:我有一个 csv，它在某些时候会变成这样：

  57926,57927,"79961', 'dsfdfdf'",fdfdfdfd,0.40997048,5 x fdfdfdfd,
57927,57928,"fb0ec52878b165aa14ae302e6064aa636f9ca11aa11f5', 'fdfd'",fdfdfd,1.64948454,20 fdfdfdfd,"



                         US 



                "
57928,57929,"f55bf599dba600550de724a0bec11166b2c470f98aa06', 'fdfdf'",fdfdfd,0.81300813,10 fdfdfdfd,"



                         US 







                "
57929,57930,"82e6b', 'reetrtrt'",trtretrtr,0.79783365,fdfdfdf,"



                         NL

I want to get rid of this empty lines.我想摆脱这个空行。 So far I tried the following script :到目前为止，我尝试了以下脚本：

df = pd.read_csv("scedon_etoimo.csv")

df = df.replace(r'\\n',' ', regex=True)

and和

df=df.replace(r'\r\r\r\r\n\t\t\t\t\t\t', '',regex=True)

as this is the error I am getting.因为这是我得到的错误。 So far I haven't manage to clean my file and do the stuff I want to do.到目前为止，我还没有设法清理我的文件并做我想做的事情。 I am not sure if I am using the correct approach.我不确定我是否使用了正确的方法。 I am using pandas to process my dataset.我正在使用熊猫来处理我的数据集。 Any help?有什么帮助吗？
" ”

Answer 1

I would first open and preprocess the file's data, and just then pass to pandas我会首先open并预处理文件的数据，然后传递给熊猫

lines = []
with open('file.csv') as f:
    for line in f:
        if line.strip(): lines.append(line.strip())

df = pd.read_csv(io.StringIO("\n".join(lines)))

Answer 2

Based on the file snippet you provided, here is how you can replace those empty lines Pandas is storing as NaNs with a blank string.根据您提供的文件片段，以下是如何用空字符串替换 Pandas 存储为 NaN 的空行。

import numpy as np
df = pd.read_csv("scedon_etoimo.csv")
df = df.replace(np.nan, "", regex=True)

This will allow you to do everything on the base Pandas DataFrame without reading through your file(s) more than once.这将允许您在基本 Pandas DataFrame 上执行所有操作，而无需多次阅读您的文件。 That being said, I would also recommend preprocessing your data before loading it in as that is often times a much safer way to handle data in non-uniform layouts.话虽如此，我还建议在加载数据之前对其进行预处理，因为这通常是处理非统一布局中数据的更安全的方法。

Answer 3

Try:尝试：

df.replace(to_replace=r'[\n\r\t]', value='', regex=True, inplace=True)

This instruction replaces each \\n , \\r and Tab with nothing.此指令将每个\\n 、 \\r和Tab替换为空。 Due to inplace argument, no need to substitute the result to df again.由于inplace参数，无需再次将结果替换为df 。

Alternative: Use to_replace=r'\\s' to eliminate also spaces, maybe in selected columns only.替代方法：使用to_replace=r'\\s'也可以消除空格，可能仅在选定的列中。

Pandas python 用字符串替换空行

问题描述

3 个解决方案

解决方案1
2 2018-09-26 19:31:34

解决方案2
0 2018-09-26 19:46:49

解决方案3
0 2018-09-26 19:52:43

Pandas python 用字符串替换空行

问题描述

3 个解决方案

解决方案1 2 2018-09-26 19:31:34

解决方案2 0 2018-09-26 19:46:49

解决方案3 0 2018-09-26 19:52:43

解决方案1
2 2018-09-26 19:31:34

解决方案2
0 2018-09-26 19:46:49

解决方案3
0 2018-09-26 19:52:43