在读取 python 中的 csv 文件时跳过具有“NaN”值的特定行

Question

I have csv which i read in a query from a windows folder.我有 csv 我从 windows 文件夹的查询中读取。

files = glob.glob(r"LBT210*.csv")
dfs = [pd.read_csv(f, sep=";", engine='c') for f in files]
df2 = pd.concat(dfs,ignore_index=True)

However the output looks like:然而 output 看起来像：

columnA columnB columnC
1         1        0
2         0        A
NaN       NaN      1
3         B        D
...

How can I skip reading the rows, which contain a 'NaN' (none-value) in the columnB, so that i can save some memory and speed processing it?如何跳过读取B 列中包含“NaN”（无值）的行，以便保存一些 memory 并加快处理速度？ So I don't want to read the rows: I want to adjust:所以我不想阅读这些行：我想调整：

dfs = [pd.read_csv(f, sep=";", engine='c') for f in files] somehow dfs = [pd.read_csv(f, sep=";", engine='c') for f in files]不知何故

Answer 1

According to the selected answer from this question here there isn't a way to filter before the file is read into memory.根据此处选择的此问题的答案，在将文件读入 memory 之前无法进行过滤。 Since this was from over 10 years ago, I also rechecked the read_csv options and it doesn't look like anything else may help.由于这是 10 多年前的事情，我还重新检查了 read_csv 选项，它看起来没有其他任何帮助。

After being inspired from the other stackoverflow question and selected answer, you can do something like this to reduce memory consumption.在从另一个 stackoverflow 问题和选择的答案中得到启发后，您可以执行以下操作来减少 memory 消耗。

iter_csv = pd.read_csv(f, sep=";", enine='c', iterator=True, chunksize=1000)
df = pd.concat([chunk[~chunk['columnB'].isna()] for chunk in iter_csv])

在读取 python 中的 csv 文件时跳过具有“NaN”值的特定行

问题描述

1 个解决方案

解决方案1
0 2022-01-27 14:25:01

在读取 python 中的 csv 文件时跳过具有“NaN”值的特定行

问题描述

1 个解决方案

解决方案1 0 2022-01-27 14:25:01

解决方案1
0 2022-01-27 14:25:01