簡體   English   中英


[英]Read and write data from text file to numpy column in python

我一直在努力使以下文本文件格式起作用。 我的總體目標是提取整個文本文件中變量名之一的值。 例如,我想要B行和D行的所有值。 然后將它們放在普通的numpy數組中並運行計算。


[a] 1424457484310
[b] 5313402937
[c] 873348378938
[d] 882992596992
[e] 14957596088
243 62 184 145 250 180 106 208 248 87 186 137 127 204 18 142 37 67 36 72 48     204 255 30 243 78 44 121 112 139 76 71 131 50 118 10 42 8 67 4 98 110 37 5 208   104 56 55 225 56 0 102 0 21 0 156 0 174 255 171 0 42 0 233 0 50 0 254 0 245 255   110 
[a] 1424457484310
[b] 5313402937
[c] 873348378938
[d] 882992596992
[e] 14957596088
243 62 184 145 250 180 106 208 248 87 186 137 127 204 18 142 37 67 36 72 48   204 255 30 243 78 44 121 112 139 76 71 131 50 118 10 42 8 67 4 98 110 37 5 208 104 56 55 225 56 0 102 0 21 0 156 0 174 255 171 0 42 0 233 0 50 0 254 0 245 255 110 



filename_load = fileopenbox(msg=None, title='Load Data File',

col1_data = np.genfromtxt(filename_load, skip_header=1, dtype=None, 
usecols=(0,), usemask=True, invalid_raise=False)

col2_data = np.genfromtxt(filename_load, skip_header=1, dtype=None, 
usecols=(1,), usemask=True, invalid_raise=False)


arr_index = np.where(col1_data == '[b]')
new_array = col2_data[arr_index]






from collections import OrderedDict
import re

ss = """[SECTION1a]
[a] 1424457484310
[b] 5313402937
[c] 873348378938
[d] 882992596992
[e] 14957596088
243 62 184 145 250 180 106 208 248 87 186 137 127 204 18 142 37 67 36 72 48     204 255 30 243 78 44 121 112 139 76 71 131 50 118 10 42 8 67 4 98 110 37 5 208   104 56 55 225 56 0 102 0 21 0 156 0 174 255 171 0 42 0 233 0 50 0 254 0 245 255   110
[a] 1424457484310
[b] 5313402937
[c] 873348378938
[d] 882992596992
[e] 14957596088
243 62 184 145 250 180 106 208 248 87 186 137 127 204 18 142 37 67 36 72 48   204 255 30 243 78 44 121 112 139 76 71 131 50 118 10 42 8 67 4 98 110 37 5 208 104 56 55 225 56 0 102 0 21 0 156 0 174 255 171 0 42 0 233 0 50 0 254 0 245 255 110

# regular expressions for matching SECTIONs
p1 = re.compile("^\[SECTION[0-9]+a\]")
p2 = re.compile("^\[SECTION[0-9]+b\]")
p3 = re.compile("^\[END SECTION[0-9]+\]")

def parse(ss):
    """ Make hierachial dict from string """
    ll, l_cnt = ss.splitlines(), 0
    d = OrderedDict()
    while l_cnt < len(ll): # iterate through lines
        l = ll[l_cnt].strip()
        if p1.match(l):  # new sub dict for [SECTION*a]
            dd, nn = OrderedDict(), l[1:-1]
            l_cnt += 1
            while (p2.match(ll[l_cnt].strip()) is None and
                   p3.match(ll[l_cnt].strip()) is None):
                ww = ll[l_cnt].split()
                dd[ww[0][1:-1]] = int(ww[1])
                l_cnt += 1
            d[nn] = dd
        elif p2.match(l):  # array of ints for [SECTION*b]
            d[l[1:-1]] = [int(w) for w in ll[l_cnt+1].split()]
            l_cnt += 2
        elif p3.match(l):
            l_cnt += 1
    return d

dd = parse(ss)

請注意,如果您使用現有的解析工具(例如Parsley ),則可以獲得更強大的代碼。

要從所有部分中檢索'[c]' ,請執行以下操作:

print("All entries for [c]: ", end="")
cc = [d['c'] for s,d in dd.items() if s.endswith('a')]
print(", ".join(["{}".format(c) for c in cc]))    
# Gives: All entries for [c]: 873348378938, 873348378938


def print_recdicts(d, tbw=0):
    """print the hierachial dict """
    for k,v in d.items():
        if type(v) is OrderedDict:
            print(" "*tbw + "* {}:".format(k))
            print_recdicts(v, tbw+2)
            print(" "*tbw + "* {}: {}".format(k,v))

# Gives:
# * SECTION1a:
#   * a: 1424457484310
#   * b: 5313402937
# ...

下面應該這樣做。 它使用運行中的存儲庫( tally )來處理缺少的值,然后在擊中結束標記時將狀態寫出。

import re
import numpy as np

filename = "yourfilenamehere.txt"

# [e] 14957596088
match_line_re = re.compile(r"^\[([a-z])\]\W(\d*)")

result = {

tally_empty = dict( zip( result.keys(), [np.nan] * len(result) ) )

tally = tally_empty
with open(filename, 'r') as f:
    for line in f:
        if line.startswith('[END SECTION'):
            # Write accumulated data to the lists
            for k, v in tally.items():

            tally = tally_empty 

            # Map the items using regex
            m = match_line_re.search(line)
            if m:
                k, v = m.group(1), m.group(2)
                if k in tally:
                    tally[k] = v

b = np.array(result['b'])
d = np.array(result['d'])



聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

粵ICP備18138465號  © 2020-2024 STACKOOM.COM