[英]How can I optimize this code to cope with a larger dataset?
I have a set of images with the same size.我有一组相同大小的图像。 And I want to insert them into a dataframe, with the rows being the names of the images and the columns being the pixels.我想将它们插入到 dataframe 中,行是图像的名称,列是像素。 They are all in the same directory.它们都在同一个目录中。
from matplotlib import image
import numpy as np
import pandas as pd
import glob
columns = ["file"]
for i in range (150528):
columns.append("pixel" + str(i))
df = pd.DataFrame(columns = columns)
i = 0
for file in glob.glob('/home/nuno/resizepics/*.jpg'):
imgarr = image.imread(file)
imgarr = imgarr.flatten()
df.loc[i,"file"] = file
for j in range(len(imgarr)):
df.iloc[i,j+1] = imgarr[j]
i += 1
#print(df)
df.to_csv('pixels.csv')
If "killed" means it raises an error you can try using exeptions (try, except, else)
and make it try again from the spot it stopped.如果“killed”意味着它会引发错误,您可以尝试使用exeptions (try, except, else)
并让它从停止的地方重试。 You can also try to delay it a bit with time
module because it works with large data.您也可以尝试使用time
模块稍微延迟它,因为它适用于大数据。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.