简体   繁体   中英

Best way to get just strings, integers, and/or floats from a data file in python?

For example:

My input:

Input:
zxxxxyzzxyxyxyzxzzxzzzyzzxxxzxxyyyzxyxzyxyxyzyyyyzzyyyyzzxzxzyzzzzyxzxxxyxxxxyyzyyzyyyxzzzzyzxyzzyyy
--------
x y z
--------
A B
--------
    A   B
A   0.634   0.366   
B   0.387   0.613   
--------
    x   y   z
A   0.532   0.226   0.241   
B   0.457   0.192   0.351


Output:
AAAAAAAAAAAAAABBBBBBBBBBBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBBBBBBBBBAAAAAAAAAAAAAAAAAAAAABBBBBBBBBBAAA

Right now I'm using this:

import sys, re

data = []
for line in sys.stdin.readlines():
    data.append(''.join(line.strip().split()))

cleanup = []
for i in range(len(data)):
    cleanup.append(re.sub(r"\S+", " ", data[i]))

print(data)

and my output looks like this:

['zxxxxyzzxyxyxyzxzzxzzzyzzxxxzxxyyyzxyxzyxyxyzyyyyzzyyyyzzxzxzyzzzzyxzxxxyxxxxyyzyyzyyyxzzzzyzxyzzyyy', '--------', 'xyz', '--------', 'AB', '--------', 'AB', 'A0.6340.366', 'B0.3870.613', '--------', 'xyz', 'A0.5320.2260.241', 'B0.4570.1920.351']

But what I want my data list to look like is:

print(data)
['zxxxxyzzxyxyxyzxzzxzzzyzzxxxzxxyyyzxyxzyxyxyzyyyyzzyyyyzzxzxzyzzzzyxzxxxyxxxxyyzyyzyyyxzzzzyzxyzzyyy', 'x', 'y', 'z', 'A', 'B', '0.634', '0.366', '0.387', '0.613', '0.532', '0.226', '0.241', '0.457', '0.192', '0.351']

You are almost right. You simply need to not join back the split() result. Instead, append the data list with each element from the split()

import sys, re

data = []
for line in sys.stdin.readlines():
    for x in re.sub(r"[^a-zA-Z\d\s\.]", "", line).strip().split():
        data.append(x)

print(data)

You could do it like this...

rawLines = raw.split("\n")

data = {}
data["seq"] = rawLines[1]

data["mat1"] = {}
for k in [8,9]:
    temp = rawLines[k].split("\t")
    if(k==8):
        data["mat1"]["A"] = {"A":float(temp[1]),"B":float(temp[2])}
    else:
        data["mat1"]["B"] = {"A":float(temp[1]),"B":float(temp[2])}

data["mat2"] = {}
for k in [14,15]:
    temp = rawLines[k].split("\t")
    if(k == 14):
        data["mat2"]["A"]={"X":float(temp[1]),"Y":float(temp[2]),"Z":float(temp[3])}
    elif(k == 15):
        data["mat2"]["B"]={"X":float(temp[1]),"Y":float(temp[2]),"Z":float(temp[3])}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM