简体   繁体   English

在python中保存/读取彩色图像像素数据

[英]Saving/reading color image pixel data in python

I'm trying to write an algorithm that will save the filename and the 3 channel np.array stored in each filename to a csv (or similar filetype), and then be able to read in the csv and reproduce the color image image. 我正在尝试编写一种算法,将文件名和每个文件名中存储的3通道np.array保存到csv(或类似文件类型)中,然后能够读取csv并重现彩色图像。

The format of my csv should look like this: 我的csv格式应如下所示:

  Filename RGB
0 foo.png  np.array      # the shape is 100*100*3
1 bar.png  np.array
2 ...      ...

As it stands, I'm iterating through each file saved in a directory and appending a list that later gets stored in a pandas.DataFrame: 就目前而言,我要遍历目录中保存的每个文件,并附加一个列表,该列表以后将存储在pandas.DataFrame中:

df1= pandas.DataFrame()
df2= pandas.DataFrame()
directory= r'C:/my Directory'
fileList= os.listdir(directory)
filenameList= []
RGBList= []
for eachFile in fileList:
    filenameList.append(eachFile)
    RGBList.append(cv2.imread(directory + eachFile, 1).tostring())
df1["Filenames"]= filenameList
df2["RGB"]= RGBList
df1.to_csv('df1.csv')
df2.to_csv('df2.csv')

df1 functions as desired. df1根据需要发挥作用。 I THINK df2 fuctions as intended. 我按预期考虑df2功能。 A print statement reveals the correct len of 30,000 for each row of the csv. print语句显示CSV每一行的正确len为30,000。 However, when I read in the csv using pandas.read_csv('df2') and use a print statement to view the len of the first row, I get 110541. I intend to use np.fromstring() and np.reshape() to reshape the flattened np.array generated from np.tostring() , but I get the error: 但是,当我使用pandas.read_csv('df2')读取csv并使用print语句查看第一行的len时,得到110541。我打算使用np.fromstring()np.reshape()重塑从np.tostring()生成的np.tostring()平的np.array ,但出现错误:

ValueError: string size must be a multiple of element size

...because the number of elements is mismatched. ...因为元素数量不匹配。

My question is: 我的问题是:

  1. Why is the len so much larger when I read in the csv? 为什么在csv中阅读时len这么大?
  2. Is there a more efficient way to write 3 channel color image pixel data to a csv that can easily be read back in? 是否有更有效的方法将3通道彩色图像像素数据写入可以轻松读回的csv?

If you write a single byte for each 8-bit pixel you will get a line with 1 byte per pixel. 如果您为每个8位像素写一个字节,您将获得一行,每个像素1个字节。 So, if your image is 80 pixels wide, you will get 80 bytes per line. 因此,如果图像为80像素宽,则每行将获得80字节。

If you write a CSV, in human-readable ASCII, you will need more space. 如果以人类可读的ASCII格式编写CSV,则将需要更多空间。 Imagine the first pixel is 186. So, you will write a 1 , an 8 , a 6 and a comma - ie 4 bytes now for the first pixel instead of a single byte in binary, and so on. 想象一下,第一像素是186那么,你会写一个1 ,一8 ,一6和逗号-即4个字节现在第一像素,而不是二进制单字节,等等。

That means your file will be around 3-4x bigger, ie 110k instead of 30k, which is what you are seeing. 这意味着您的文件将大3-4倍,即110k而不是30k,这就是您所看到的。


There is no "better way" to write a CSV - the problem is that is a fundamentally inefficient format designed for humans rather than computers. 没有编写 CSV ”的“更好的方法” -问题是,这是为人类而不是为计算机设计的根本上效率低下的格式。 Why did you choose CSV? 为什么选择CSV? If it has to be legible for humans, you have no choice. 如果必须对人类清晰易读,那么您别无选择。

If it can be illegible to humans, but readily legible to computers, choose a different format such np.save() and np.load() - as you wisely have done already ;-) 如果它对人类来说是难以辨认的,但对计算机而言则很容易辨认,请选择其他格式,例如np.save()np.load() -正如您明智的做法;-)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM