如何从Python文件中提取特定的一组值？

Question

我在这里坚持逻辑......我必须从一个看起来像这样的文本文件中提取一些值

AAA
+-------------+------------------+
|          ID |            count |
+-------------+------------------+
|           3 |             1445 |
|           4 |              105 |
|           9 |              160 |
|          10 |               30 |
+-------------+------------------+
BBB
+-------------+------------------+
|          ID |            count |
+-------------+------------------+
|           3 |             1445 |
|           4 |              105 |
|           9 |              160 |
|          10 |               30 |
+-------------+------------------+
CCC
+-------------+------------------+
|          ID |            count |
+-------------+------------------+
|           3 |             1445 |
|           4 |              105 |
|           9 |              160 |
|          10 |               30 |
+-------------+------------------+

我无法仅从BBB中提取价值并将其附加到类似

f = open(sys.argv[1], "r")
text = f.readlines()
B_Values = []
for i in text:
    if i.startswith("BBB"):(Example)
       B_Values.append("only values of BBB")
    if i.startswith("CCC"):
       break

print B_Values

应该导致

['|           3 |             1445 |','|           4 |              105 |','|           9 |              160 |','|          10 |               30 |']

Answer 1

d = {}
with open(sys.argv[1]) as f:
    for line in f:
        if line[0].isalpha(): # is first character in the line a letter?
            curr = d.setdefault(line.strip(), [])
        elif filter(str.isdigit, line): # is there any digit in the line?
            curr.append(line.strip())

对于这个文件， d现在是：

{'AAA': ['|           3 |             1445 |',
         '|           4 |              105 |',
         '|           9 |              160 |',
         '|          10 |               30 |'],
 'BBB': ['|           3 |             1445 |',
         '|           4 |              105 |',
         '|           9 |              160 |',
         '|          10 |               30 |'],
 'CCC': ['|           3 |             1445 |',
         '|           4 |              105 |',
         '|           9 |              160 |',
         '|          10 |               30 |']}

你的B_values是d['BBB']

Answer 2

您可以使用bstarted状态标志来跟踪B组开始的时间。 扫描B组后，删除三个标题行和一个页脚行。

B_Values = []
bstarted = False
for i in text:
    if i.startswith("BBB"):
        bstarted = True
    elif i.startswith("CCC"):
        bstarted = False
        break
    elif bstarted:
        B_Values.append(i)

del B_Values[:3]   # get rid of the header
del B_Values[-1]   # get rid of the footer
print B_Values

Answer 3

您应该避免迭代已读取的行。 每当您想要阅读下一行时调用readline并检查它是什么：

f = open(sys.argv[1], "r")
B_Values = []
while i != "":
    i = f.readline()
    if i.startswith("BBB"): #(Example)
        for temp in range(3):
            f.skipline() #Skip the 3 lines of table headers
        i = f.readline()
        while i != "+-------------+------------------+" and i !="":
            #While we've not reached the table footer
            B_Values.append(i)
            i = f.readline()
        break

#Although not necessary, you'd better put a close function there, too.
f.close()

print B_Values

编辑：@eumiro的方法比我的方法更灵活。 因为它从所有部分读取所有值。 尽管在我的示例中可以实现isalpha测试以读取所有值，但他的方法仍然更易于阅读。

如何从Python文件中提取特定的一组值？

问题描述

3 个解决方案

解决方案1
3 已采纳 2011-12-02 07:41:45

解决方案2
0 2011-12-02 07:46:22

解决方案3
0 2011-12-02 07:46:49

如何从Python文件中提取特定的一组值？

问题描述

3 个解决方案

解决方案1 3 已采纳 2011-12-02 07:41:45

解决方案2 0 2011-12-02 07:46:22

解决方案3 0 2011-12-02 07:46:49

解决方案1
3 已采纳 2011-12-02 07:41:45

解决方案2
0 2011-12-02 07:46:22

解决方案3
0 2011-12-02 07:46:49