[英]Skip specific rows with an "NaN" value while reading a csv file in python
I have csv which i read in a query from a windows folder.我有 csv 我从 windows 文件夹的查询中读取。
files = glob.glob(r"LBT210*.csv")
dfs = [pd.read_csv(f, sep=";", engine='c') for f in files]
df2 = pd.concat(dfs,ignore_index=True)
However the output looks like:然而 output 看起来像:
columnA columnB columnC
1 1 0
2 0 A
NaN NaN 1
3 B D
...
How can I skip reading the rows, which contain a 'NaN' (none-value) in the columnB, so that i can save some memory and speed processing it?如何跳过读取B 列中包含“NaN”(无值)的行,以便保存一些 memory 并加快处理速度? So I don't want to read the rows: I want to adjust:
所以我不想阅读这些行:我想调整:
dfs = [pd.read_csv(f, sep=";", engine='c') for f in files]
somehow dfs = [pd.read_csv(f, sep=";", engine='c') for f in files]
不知何故
According to the selected answer from this question here there isn't a way to filter before the file is read into memory.根据此处选择的此问题的答案,在将文件读入 memory 之前无法进行过滤。 Since this was from over 10 years ago, I also rechecked the read_csv options and it doesn't look like anything else may help.
由于这是 10 多年前的事情,我还重新检查了 read_csv 选项,它看起来没有其他任何帮助。
After being inspired from the other stackoverflow question and selected answer, you can do something like this to reduce memory consumption.在从另一个 stackoverflow 问题和选择的答案中得到启发后,您可以执行以下操作来减少 memory 消耗。
iter_csv = pd.read_csv(f, sep=";", enine='c', iterator=True, chunksize=1000)
df = pd.concat([chunk[~chunk['columnB'].isna()] for chunk in iter_csv])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.