我的熊猫数据框中缺少数据。我如何告诉python不要在新的数据框中包含它？

Question

我有一个文本文件mart_export.txt充满了两种不同类型的键，看起来像这样

Gene stable ID  RefSeq match transcript
ENSG00000243959 
ENSG00000206698 
ENSG00000265684 
ENSG00000251990 
ENSG00000241552 
ENSG00000050767 NM_173465.4

如您所见，大多数右列都没有任何数据，但是我正尝试仅使用具有两列值的索引来构建新的pandas数据框。 到目前为止，这是我的脚本

#Put the biomart export in a pandas dataframe
mart = pd.read_csv("mart_export.txt", delimiter="\t")

#Create new list of records with Gene Stable Id and RefSeq numbers
d = {'Gene Stable ID': [], 'RefSeq ID': []}
for i in mart:
    if mart['RefSeq match transcript'] != NaN:
        d['Gene Stable ID'].append(mart['Gene stable ID'])
        d['RefSeq ID'].append(mart['RefSeq match transcript'])

在Spyder中，第二列中空白的值标记为NaN，但是当我尝试在代码中使用此值时，我在python中收到一条错误，指出未定义NaN。 如何指定python的空白外观？

Answer 1

您可以通过拖放的行或列dropna()大熊猫的方法， DataFrame 。

在您的情况下，它将是：

mart.dropna(axis="rows", inplace=True)

您可以删除包含列NaN S，指定how论证等，检查上面链接的文档。

Answer 2

要检测NaN ，可以使用pd.isna或pd.isnull 。

但是， mart是DataFrame，因此mart['RefSeq match transcript']是一列。

mart['RefSeq match transcript'] == something将返回序列。

因此，条件'if mart['RefSeq match transcript'] == something'将始终返回错误，无论您尝试比较什么值。

您要么需要dropna ，如其他答案所示，要么过滤掉nan ，如下所示：

mart_noNaN = mart[~mart['RefSeq match transcript'].isna()]

注意`mart前面的'~'否定。

我的熊猫数据框中缺少数据。我如何告诉python不要在新的数据框中包含它？

问题描述

2 个解决方案

解决方案1
2 已采纳 2019-04-15 16:17:18

解决方案2
0 2019-04-15 17:00:54

我的熊猫数据框中缺少数据。 我如何告诉python不要在新的数据框中包含它？

问题描述

2 个解决方案

解决方案1 2 已采纳 2019-04-15 16:17:18

解决方案2 0 2019-04-15 17:00:54

我的熊猫数据框中缺少数据。我如何告诉python不要在新的数据框中包含它？

解决方案1
2 已采纳 2019-04-15 16:17:18

解决方案2
0 2019-04-15 17:00:54