如何删除NaN列？

Question

我的文本文件有3列，例如：

1  2  3
2  4  6
3  6  9

我想像这样安排它：

wave  shape  freq
  1     2      3
  2     4      6
  3     6      9

我使用以下脚本：

import glob
import pandas as pd


import_file = glob.glob('data.txt')
for files in import_file:
     intial_data = pd.read_csv(files, header=None, delimiter="\t").values
     table = pd.DataFrame(intial_data, columns = ['wave' , 'shape', 'freq')
      print(table)

它给了我

错误：传递的值的形状为（4，150），索引暗示（3，150）

我通过添加另一个列指示符X代替了安排和描述表格的行

table = pd.DataFrame (intial_data, columns = ['wave' , 'shape', 'freq','x'])

简而言之，它运作良好，并给了我这个结果

       wave     shape    freq     x
 0    1.0000   2.0000   3.0000   NaN
 1    2.0000   4.0000   6.0000   NaN
 2    3.0000   6.0000   9.0000   NaN

我无法理解我们的NaN列，因此需要将其从工作中删除

请提出任何建议？

Answer 1

不要接受丢弃NaN的建议。 那将是XY Problem的解决方案，而不是原因的解决方案。

改用

intial_data = pd.read_csv('data.txt', header=None, delim_whitespace=True)

要么

intial_data = pd.read_csv('data.txt', header=None, sep='\s+')

出现此错误的原因是因为您的data.txt文件最后有一个额外的选项卡。 pandas将其解释为NaN的额外一列。

所以，即使你看到

1  2  3
2  4  6
3  6  9

您最可能拥有的是

1\t2\t3\t
2\t4\t6\t
3\t6\t9\t

最后一个\\t添加额外的列。

Answer 2

df.dropna(1,'all')

输出：

   wave  shape  freq
0   1.0    2.0   3.0
1   2.0    4.0   6.0
2   3.0    6.0   9.0

或者你也可以直接读取文件中的第3列与usecols的参数pd.read_csv 。 使用以下代码，您无需首先读取intial_data直接获得table变量：

table = pd.read_csv(files,
                    header=None,
                    delimiter="\t",
                    usecols=range(3),
                    names=['wave', 'shape', 'freq'])

如何删除NaN列？

问题描述

2 个解决方案

解决方案1
2 2019-03-10 12:37:57

解决方案2
0 已采纳 2019-03-10 12:31:47

如何删除NaN列？

问题描述

2 个解决方案

解决方案1 2 2019-03-10 12:37:57

解决方案2 0 已采纳 2019-03-10 12:31:47

解决方案1
2 2019-03-10 12:37:57

解决方案2
0 已采纳 2019-03-10 12:31:47