简体   繁体   English

Python:解析制表符分隔文件中的特定列(从头开始,没有“ import csv”)

[英]Python: Parsing a specific column (from scratch, no “import csv”) in tab-separated-file

I've written some code that can parse a string into tuples as such: 我已经编写了一些可以将字符串解析为元组的代码,例如:

s = '30M3I5X'
l = []
num = ""
for c in s:
  if c in '0123456789':
     num = num + c
     print(num)
  else:
     l.append([int(num), c])
  num = ""

print(l)

Ie;

'30M3I5X' 

becomes 变成

[[30, 'M'], [3, 'I'], [5, 'X']]

That part works just fine. 那部分工作正常。 I'm struggling now, however, with figuring out how to get the values from the first column of a tab-separated-value file to become my new 's'. 但是,我现在正在努力寻找如何从制表符分隔值文件的第一列中获取值,以使其成为新的“ s”。 Ie; for a file that looks like: 对于看起来像这样的文件:

# File Example #
30M3I45M2I20M   I:AAC-I:TC
50M3X35M2I20M   X:TCC-I:AG

There would somehow be a loop incorporated to take only the first column, producing 将以某种方式合并一个循环以仅采用第一列,从而产生

[[30, 'M'],[3, 'I'],[45, 'M'],[2, 'I'],[20, 'M']]
[[50, 'M'],[3, 'X'],[35, 'M'],[2, 'I'],[20, 'M']]

without having to use 无需使用

import csv 

Or any other module. 或任何其他模块。

Thanks so much! 非常感谢!

Just open the path to the file and iterate through the records? 只需打开文件的路径并遍历记录?

def fx(s):    
    l=[]  
    num=""  
    for c in s:  
        if c in '0123456789':  
           num=num+c  
        print(num)  
        else:  
           l.append([int(num), c])  
      num=""  
    return l

with open(fp) as f:
  for record in f:
      s, _ = record.split('\t')
      l = fx(s)
      # process l here ...

The following code would serve your purpose 以下代码将满足您的目的

rows = ['30M3I45M2I20M   I:AAC-I:TC', '30M3I45M2I20M   I:AAC-I:TC']

for row in rows:
    words = row.split('  ')
    print(words[0])
    l = []
    num = ""
    for c in words[0]:
        if c in '0123456789':
             num = num + c
        else:
            l.append([int(num), c])

    print(l)

Change row.split(' ') to ('\\t') or any other seperator as per the need 根据需要将row.split('')更改为('\\ t')或任何其他分隔符

something like this should do what you're looking for. 这样的事情应该可以满足您的需求。

filename = r'\path\to\your\file.txt'
with open(filename,'r') as input:
    for row in input:
        elements = row.split()
        # processing goes here  

elements[0] contains the string that is the first column of data in the file. elements [0]包含字符串,它是文件中数据的第一列。

Edit: 编辑:

to end up with a list of the lists of processed data: 最终得到处理数据列表的列表:

result = []
filename = r'\path\to\your\file.txt'
with open(filename,'r') as input:
    for row in input:
        elements = row.split()
        # processing goes here
        result.append(l) # l is the result of your processing

So this is what ended up working for me--took bits and pieces from everyone, thank you all! 这就是最终对我有用的东西-吸引了每个人的点滴,谢谢大家!

Note: I know it's a bit verbose, but since I'm new, it helps me keep track of everything :) 注意:我知道这有点冗长,但是由于我是新手,因此它可以帮助我跟踪所有事情:)

#Defining the parser function

def col1parser(col1):
l = []
num = ""
for c in col1:
    if c in '0123456789':
        num = num + c
    else:
        l.append([int(num), c])
        num = ""
print(l)


#Open file, run function on column1
filename = r'filepath.txt'
with open(filename,'r') as input:
    for row in input:
        elements = row.split()
        col1 = elements[0]
        l = col1parser(col1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM