如何使用Python从文本文件中读取元数据（带有标签）

Question

The data at the start of the textfile is of this format : 文本文件开头的数据具有以下格式：

&SRS
<MetaDataAtStart>
multiple=True
Wavelength (Angstrom)=0.97587
mode=assessment
background=True
issid=py11n2g
noisy=True
</MetaDataAtStart>
&END
Two Theta(deg)  Counts(sec^-1)
10.0    41.0
10.1    39.0
10.2    38.0
10.3    38.0

What method can I use to extract the metadata value of wavelenght? 我可以使用哪种方法来提取wavelenght的元数据值？ Would the CSV Dictionary reader work? CSV词典阅读器可以工作吗？

Answer 1

The most simple solution would to read the header of the file: 最简单的解决方案是读取文件头：

f = open("data.txt", "r")
for line in f:
    if "</MetaDataAtStart>" in line:
        print "Wavelength data was not found"
        break;
    if "Wavelength" in line:
        print line.split("=")[1]

Output: 输出：

0.97587

Edit: 编辑：

import re
f = open("data.txt", "r")
regex = re.compile(r'Wavelength \(Angstrom\)=([0-9]+\.?[0-9]*)')
for line in f:
    result = regex.search(line)
print result.group(1)

Output: 输出：

0.97587

Answer 2

BeautifulSoup with lxml can do this. 带有lxml的BeautifulSoup可以做到这一点。 Once you find the tag with findAll() then you can extract the data. 使用findAll（）找到标签后，即可提取数据。 At this point Python can easily split() on \\n and again on =. 此时，Python可以轻松地在\\ n上再次拆分（），然后在=上再次拆分。 Let me know if you want a code sample and I'll provide one. 让我知道是否需要代码示例，我将提供一个。

如何使用Python从文本文件中读取元数据（带有标签）

问题描述

2 个解决方案

解决方案1
2 已采纳 2013-11-24 19:37:55

解决方案2
0 2013-11-24 19:43:10

如何使用Python从文本文件中读取元数据（带有标签）

问题描述

2 个解决方案

解决方案1 2 已采纳 2013-11-24 19:37:55

解决方案2 0 2013-11-24 19:43:10

解决方案1
2 已采纳 2013-11-24 19:37:55

解决方案2
0 2013-11-24 19:43:10