读取多个.csv 文件并提取（在新的.csv 文件中）与特定列中的非空单元格对应的所有行

Question

我有多个.csv 文件（大约 250 个）。 他们每个人都有完全相同的列。 所有这些，在许多列中都有足够的空单元格。 我有兴趣仅提取与特定列（名为 20201-2.0）的非空单元格对应的所有行。 我相信它会更好地与pandas一起使用。

到目前为止，我已经完成了以下步骤，如果继续，这将起作用：

import pandas as pd
import glob

path = './'
column = ['20201-2.0']

all_files = glob.glob(path + "/*.csv")

li = []

for filename in all_files:
    df = pd.read_csv(filename,?,?)
    li.append(df)

有没有一种方法可以只提取与df中'20201-2.0'列的非空单元格对应的行？

还是其他方式？

乔治

Answer 1

df = pd.read_csv('myfile.csv').dropna(subset='20201-2.0')

Answer 2

如果单元格是真正的“空”而不是保存一个空格字符串（“”）或零，那么它们将包含一个“NaN”（一个真正的空值）。 你应该能够得到他们...

df = li[li['20201-2.0'].notnull()]

一个更完整的例子......

import pandas as pd
import numpy as np

# Create the dataframe "li" with a bunch of random numbers
li = pd.DataFrame(np.random.randn(5,4), columns= ['Col1', 'Col2','20201-2.0', 'Col4'])
# Make one sepcific cell below the "20201-2.0" column a null (NaN) value
li['20201-2.0'].iloc[2] = np.NaN
print(li) # See what youÄre working with

# Select for all rows, in all columns where the column "20201-2.0" is not a null
# This will return a full dataframe, with all the rows and columns - excluding any row(s) where the cell below "20201-2.0" was null
df = li[li['20201-2.0'].notnull()]
print(df)

读取多个.csv 文件并提取（在新的.csv 文件中）与特定列中的非空单元格对应的所有行

问题描述

2 个解决方案

解决方案1
1 2019-10-15 14:38:55

解决方案2
0 2019-10-15 14:37:32

读取多个.csv 文件并提取（在新的.csv 文件中）与特定列中的非空单元格对应的所有行

问题描述

2 个解决方案

解决方案1 1 2019-10-15 14:38:55

解决方案2 0 2019-10-15 14:37:32

解决方案1
1 2019-10-15 14:38:55

解决方案2
0 2019-10-15 14:37:32