Bioformats-Python错误：'ascii'编解码器在使用OMEXML（）时无法对字符u'\ xb5'进行编码

Question

I am trying to use bioformats in Python to read in a microscopy image (.lsm, .czi, .lif, you name it), print out the meta data, and display the image. 我试图在Python中使用生物形式来读取显微镜图像（.lsm，.czi，.lif，你的名字），打印元数据，并显示图像。 ome = bf.OMEXML(md) gives me an error (below). ome = bf.OMEXML(md)给出了一个错误（下面）。 I think it's talking about the information stored within md . 我认为这是在讨论存储在md的信息。 It doesn't like that the information in md isn't all ASCII. 它不喜欢md中的信息不是所有ASCII。 But how do I overcome this problem? 但是我该如何克服这个问题呢？ This is what I wrote: 这就是我写的：

import Tkinter as Tk, tkFileDialog
import os
import javabridge as jv
import bioformats as bf
import matplotlib.pyplot as plt
import numpy as np

jv.start_vm(class_path=bf.JARS, max_heap_size='12G')

User selects file to work with 用户选择要使用的文件

#hiding root alllows file diaglog GUI to be shown without any other GUI elements
root = Tk.Tk()
root.withdraw()
file_full_path = tkFileDialog.askopenfilename()
filepath, filename = os.path.split(file_full_path)
os.chdir(os.path.dirname(file_full_path))

print('opening:  %s' %filename)
reader = bf.ImageReader(file_full_path)
md = bf.get_omexml_metadata(file_full_path)
ome = bf.OMEXML(md)

Put image in numpy array 将图像放在numpy数组中

raw_data = []
    for z in range(iome.Pixels.get_SizeZ()):
    raw_image = reader.read(z=z, series=0, rescale=False)
    raw_data.append(raw_image)
raw_data = np.array(raw_data)

Show wanted metadata 显示想要的元数据

iome = ome.image(0) # e.g. first image
print(iome.get_Name())
print(iome.Pixels.get_SizeX())
print(iome.Pixels.get_SizeY())

Here's the error I get: 这是我得到的错误：

---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-22-a22c1dbbdd1e> in <module>()
     11 reader = bf.ImageReader(file_full_path)
     12 md = bf.get_omexml_metadata(file_full_path)
---> 13 ome = bf.OMEXML(md)

/anaconda/envs/env2_bioformats/lib/python2.7/site-packages/bioformats/omexml.pyc in __init__(self, xml)
    318         if isinstance(xml, str):
    319             xml = xml.encode("utf-8")
--> 320         self.dom = ElementTree.ElementTree(ElementTree.fromstring(xml))
    321 
    322         # determine OME namespaces

<string> in XML(text)

UnicodeEncodeError: 'ascii' codec can't encode character u'\xb5' in position 1623: ordinal not in range(128)

Here's a representative test image with proprietary microscopy format 这是具有专有显微镜格式的代表性测试图像

Answer 1

Thank you for adding the sample image. 感谢您添加示例图像。 That helped tremendously! 这极大地帮助了！

Let's first remove all the unnecessary Tkinter code until we get to a Minimal, Complete and Verifiable Example that allows us to reproduce your error message. 让我们首先删除所有不必要的Tkinter代码，直到我们找到一个允许我们重现您的错误消息的Minimal，Complete和Verifiable示例。

import javabridge as jv
import bioformats as bf

jv.start_vm(class_path=bf.JARS, max_heap_size='12G')

file_full_path = '/path/to/Cell1.lsm'

md = bf.get_omexml_metadata(file_full_path)

ome = bf.OMEXML(md)

jv.kill_vm()

We first get some warning messages about 3i SlideBook SlideBook6Reader library not found but we can apparently ignore that. 我们首先得到一些关于3i SlideBook SlideBook6Reader library not found警告信息，但我们显然可以忽略它。

Your error message reads UnicodeEncodeError: 'ascii' codec can't encode character u'\\xb5' in position 1623: ordinal not in range(128) , so let's look what we can find around position 1623. 您的错误消息显示为UnicodeEncodeError: 'ascii' codec can't encode character u'\\xb5' in position 1623: ordinal not in range(128) ，所以让我们看看我们可以在1623位置找到什么。

If you add print md after md = bf.get_omexml_metadata(file_full_path) , the whole xml with metadata is printed out. 如果在md = bf.get_omexml_metadata(file_full_path)之后添加print md ，则打印出包含元数据的整个xml。 Let's zoom in: 我们放大：

>>> print md[1604:1627]
PhysicalSizeXUnit="µm"

So, the µ character is the culprit, it can't be encoded with the 'ascii' codec . 因此， µ字符是罪魁祸首，它不能用'ascii' codec 。

Looking back at the traceback: 回顾追溯：

/anaconda/envs/env2_bioformats/lib/python2.7/site-packages/bioformats/omexml.pyc in __init__(self, xml)
    318         if isinstance(xml, str):
    319             xml = xml.encode("utf-8")
--> 320         self.dom = ElementTree.ElementTree(ElementTree.fromstring(xml))
    321 
    322         # determine OME namespaces

We see that the in the lines before the error occurs, we encode our xml to utf-8 , that should solve our problem. 我们看到在错误发生之前的行中，我们将xml编码为utf-8 ，这应该可以解决我们的问题。 So why doesn't it happen? 那为什么不发生呢？

if we add print type(md) we get back <type 'unicode'> and not <type 'str'> as the code expected.. So this is a bug in omexml.py ! 如果我们添加print type(md)我们会返回<type 'unicode'>而不是<type 'str'> omexml.py <type 'str'>作为预期的代码..所以这是omexml.py一个错误！

To solve this, do the following (you might need to be root); 要解决此问题，请执行以下操作（您可能需要是root用户）;

Go to /anaconda/envs/env2_bioformats/lib/python2.7/site-packages/bioformats/ 转到/anaconda/envs/env2_bioformats/lib/python2.7/site-packages/bioformats/
remove omexml.pyc 删除omexml.pyc
in omexml.py change line 318 from isinstance(xml, str): to if isinstance(xml, basestring): 在omexml.py从isinstance(xml, str):更改第318行isinstance(xml, str):到if isinstance(xml, basestring):

basestring is the superclass for str and unicode . basestring是str和unicode的超类。 It is used to test whether an object is an instance of str or unicode . 它用于测试对象是str还是unicode的实例。

I wanted to file a bug for this, but it seems there is already an open issue . 我想为此提交一个错误，但似乎已经存在一个未解决的问题。

Bioformats-Python错误：'ascii'编解码器在使用OMEXML（）时无法对字符u'\ xb5'进行编码

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-04-26 09:59:33

Bioformats-Python错误：'ascii'编解码器在使用OMEXML（）时无法对字符u'\ xb5'进行编码

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-04-26 09:59:33

解决方案1
1 已采纳 2017-04-26 09:59:33