简体   繁体   English

在读取 python 中的 csv 文件时跳过具有“NaN”值的特定行

[英]Skip specific rows with an "NaN" value while reading a csv file in python

I have csv which i read in a query from a windows folder.我有 csv 我从 windows 文件夹的查询中读取。

files = glob.glob(r"LBT210*.csv")
dfs = [pd.read_csv(f, sep=";", engine='c') for f in files]
df2 = pd.concat(dfs,ignore_index=True)

However the output looks like:然而 output 看起来像:

columnA columnB columnC
1         1        0
2         0        A
NaN       NaN      1
3         B        D
...

How can I skip reading the rows, which contain a 'NaN' (none-value) in the columnB, so that i can save some memory and speed processing it?如何跳过读取B 列中包含“NaN”(无值)的行,以便保存一些 memory 并加快处理速度? So I don't want to read the rows: I want to adjust:所以我不想阅读这些行:我想调整:

dfs = [pd.read_csv(f, sep=";", engine='c') for f in files] somehow dfs = [pd.read_csv(f, sep=";", engine='c') for f in files]不知何故

According to the selected answer from this question here there isn't a way to filter before the file is read into memory.根据此处选择的此问题的答案,在将文件读入 memory 之前无法进行过滤。 Since this was from over 10 years ago, I also rechecked the read_csv options and it doesn't look like anything else may help.由于这是 10 多年前的事情,我还重新检查了 read_csv 选项,它看起来没有其他任何帮助。

After being inspired from the other stackoverflow question and selected answer, you can do something like this to reduce memory consumption.在从另一个 stackoverflow 问题和选择的答案中得到启发后,您可以执行以下操作来减少 memory 消耗。

iter_csv = pd.read_csv(f, sep=";", enine='c', iterator=True, chunksize=1000)
df = pd.concat([chunk[~chunk['columnB'].isna()] for chunk in iter_csv])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM