简体   繁体   English

如何使用Python从文本文件中读取元数据(带有标签)

[英]How to read in metadata (with tags) from a text file using Python

The data at the start of the textfile is of this format : 文本文件开头的数据具有以下格式:

&SRS
<MetaDataAtStart>
multiple=True
Wavelength (Angstrom)=0.97587
mode=assessment
background=True
issid=py11n2g
noisy=True
</MetaDataAtStart>
&END
Two Theta(deg)  Counts(sec^-1)
10.0    41.0
10.1    39.0
10.2    38.0
10.3    38.0

What method can I use to extract the metadata value of wavelenght? 我可以使用哪种方法来提取wavelenght的元数据值? Would the CSV Dictionary reader work? CSV词典阅读器可以工作吗?

The most simple solution would to read the header of the file: 最简单的解决方案是读取文件头:

f = open("data.txt", "r")
for line in f:
    if "</MetaDataAtStart>" in line:
        print "Wavelength data was not found"
        break;
    if "Wavelength" in line:
        print line.split("=")[1]

Output: 输出:

0.97587

Edit: 编辑:

import re
f = open("data.txt", "r")
regex = re.compile(r'Wavelength \(Angstrom\)=([0-9]+\.?[0-9]*)')
for line in f:
    result = regex.search(line)
print result.group(1)

Output: 输出:

0.97587

BeautifulSoup with lxml can do this. 带有lxml的BeautifulSoup可以做到这一点。 Once you find the tag with findAll() then you can extract the data. 使用findAll()找到标签后,即可提取数据。 At this point Python can easily split() on \\n and again on =. 此时,Python可以轻松地在\\ n上再次拆分(),然后在=上再次拆分。 Let me know if you want a code sample and I'll provide one. 让我知道是否需要代码示例,我将提供一个。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM