简体   繁体   English

在python中打开/编辑utf8适合标题(pyfits)

[英]Open/edit utf8 fits header in python (pyfits)

I have to deal with some fits files which contain utf8 text in their header. 我必须处理一些在标题中包含utf8文本的fits文件。 This means basically all functions of the pyfits package do not work. 这意味着pyfits软件包的所有功能基本上都无法正常工作。 Also .decode does not work as the fits header is a class not a list. 而且.decode也不起作用,因为fits标头是一个类而不是一个列表。 Does someone know how to decode the header so I can process the data? 有人知道如何解码标头,以便我可以处理数据吗? The actual content is not so important so something like ignoring the letters is fine. 实际内容并不重要,因此可以忽略字母。 My current code looks like this: 我当前的代码如下所示:

hdulist = fits.open('Jupiter.FIT')
hdu = hdulist[0].header
hdu.decode('ascii', errors='ignore')

And I get: AttributeError: 'Header' object has no attribute 'decode' 我得到:AttributeError:'Header'对象没有属性'decode'

Functions like: 功能如下:

print (hdu)

return: 返回:

ValueError: FITS header values must contain standard printable ASCII characters; "'Uni G\xf6ttingen, Institut f\xfcr Astrophysik'" contains characters/bytes that do not represent printable characters in ASCII.

I thought about writing something in the entry so I don't need to care about it. 我考虑过在条目中写一些东西,所以我不需要在意。 However I can' even retrieve which entry contains the bad characters and I would like to have a batch solution as I have some hundred files. 但是,我什至无法检索包含不良字符的条目,并且由于我有数百个文件,因此我希望有一个批处理解决方案。

As anatoly techtonik pointed out non-ASCII characters in FITS headers are outright invalid, and make invalid FITS files. 正如anatoly techtonik所指出的那样 ,FITS标头中的非ASCII字符完全无效,从而使FITS文件无效。 That said, it would be nice if astropy.io.fits could at least read the invalid entries. 就是说,如果astropy.io.fits至少可以读取无效条目,那就太好了。 Support for that is currently broken and needs a champion to fix it, but nobody has because it's an infrequent enough problem, and most people encounter it in one or two files, fix those files, and move on. 目前对此的支持已中断,需要维护者来解决,但没人能解决,因为这是一个很少遇到的问题,大多数人会在一个或两个文件中遇到该问题,修复这些文件然后继续。 Would love for someone to tackle the problem though. 希望有人能解决这个问题。

In the meantime, since you know exactly what string this file is hiccupping on, I would just open the file in raw binary mode and replace the string. 同时,由于您确切知道此文件正在打扰哪个字符串,因此我将以原始二进制模式打开文件并替换该字符串。 If the FITS file is very large, you could read it a block at a time and do the replacement on those blocks. 如果FITS文件很大,您可以一次读取一个块,并在这些块上进行替换。 FITS files (especially headers) are written in 2880 byte blocks, so you know that anywhere that string appears will be aligned to such a block, and you don't have to do any parsing of the header format beyond that. FITS文件(尤其是标头)以2880个字节的块编写,因此您知道该字符串出现的任何地方都将与该块对齐,并且您无需对标头格式进行任何解析。 Just make sure that the string you replace it with is no longer than the original string, and that if it's shorter it is right-padded with spaces, because FITS headers are a fixed-width format and anything that changes the length of a header will corrupt the entire file. 只需确保替换它的字符串不超过原始字符串,并且如果它较短,则用空格右填充,因为FITS标头是固定宽度格式,任何更改标头长度的操作都会破坏整个文件。 For this particular case then, I would try something like this: 那么对于这种特殊情况,我会尝试这样的事情:

bad_str = 'Uni Göttingen, Institut für Astrophysik'.encode('latin1')
good_str = 'Uni Gottingen, Institut fur Astrophysik'.encode('ascii')
# In this case I already know the replacement is the same length so I'm no worried about it
# A more general solution would require fixing the header parser to deal with non-ASCII bytes
# in some consistent manner; I'm also looking for the full string instead of the individual
# characters so that I don't corrupt binary data in the non-header blocks
in_filename = 'Jupiter.FIT'
out_filename = 'Jupiter-fixed.fits'

with open(in_filename, 'rb') as inf, open(out_filename, 'wb') as outf:
    while True:
        block = inf.read(2880)
        if not block:
            break
        block = block.replace(bad_str, good_str)
        outf.write(block)

This is ugly, and for a very large file might be slow, but it's a start. 这很丑陋,对于非常大的文件可能很慢,但这只是一个开始。 I can think of better solutions, but that are harder to understand and probably not worth taking the time on if you just have a handful of files to fix. 我可以想到更好的解决方案,但是很难理解,如果您只需要修复少量文件,那可能不值得花时间。

Once that's done, please give the originator of the file a stern talking to--they should not be publishing corrupt FITS files. 完成之后,请给文件的发起人严厉的谈话-他们不应该发布损坏的FITS文件。

Looks like PyFITS just doesn't support it (yet?) 看起来PyFITS不支持(还好吗?)

From https://github.com/astropy/astropy/issues/3497 : https://github.com/astropy/astropy/issues/3497

FITS predates unicode and has never been updated to support anything beyond the ASCII printable characters for data. FITS早于unicode,并且从未更新过以支持ASCII可打印字符以外的数据。 It is impossible to encode non-ASCII characters in FITS headers. 无法在FITS标头中编码非ASCII字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM