简体   繁体   中英

How can I split a text file into different columns using line.split()

I want to be able to split my text file into different columns.

My data from my text file look like this :

023004         1997/11/14 15:00    2.971          
023004         1997/11/14 18:00    3.175          
023004         1997/11/14 21:00    3.300          
023004         1997/11/15 00:00                   AR
023004         1997/11/15 03:00                   AR

Except when I try to split the columns, I get this:

['023002', '2008/11/20', '23:15', '1.076']
['023002', '2008/11/20', '23:30', '1.083']
['023002', '2008/11/20', '23:45', '1.089']
['023002', '2008/11/21', '00:00', 'AR']
['023002', '2008/11/21', '00:15', 'AR']
['023002', '2008/11/21', '00:30', 'AR']

AR and my data are in the same column. I don't know how to specify that if there is 'AR', it is a new column. I don't want to use panda. I need this to be able to transform my strings to float numbers.

Alright, so it seems like you're trying to use a " " delimiter, but this is not working because in your data, sometimes there is no information in a column and its guessing that you want AR in your 4th column rather than your 5th.

I think the best way to do this is to generate the rows as is and throw them into a list. Then if it matches we can just throw in the empty space.

data = [['023002', '2008/11/20', '23:15', '1.076'],
['023002', '2008/11/20', '23:30', '1.083'],
['023002', '2008/11/20', '23:45', '1.089'],
['023002', '2008/11/21', '00:00', 'AR'],
['023002', '2008/11/21', '00:15', 'AR'],
['023002', '2008/11/21', '00:30', 'AR']]

for row in data:
  if row[3] == "AR":
    row.insert(3, "")

for row in data:
  print(row)

>> 
['023002', '2008/11/20', '23:15', '1.076']
['023002', '2008/11/20', '23:30', '1.083']
['023002', '2008/11/20', '23:45', '1.089']
['023002', '2008/11/21', '00:00', '', 'AR']
['023002', '2008/11/21', '00:15', '', 'AR']
['023002', '2008/11/21', '00:30', '', 'AR']

You could do this with a regular expression as well:

import re

data = []
# this regular expression captures each column as a separate
# group
cols = re.compile("(\d+)\s{,9}(\S+)\s(\S+)\s{,4}(\d+\.\d+)*\s+(AR)*")

with open(yourfile) as fh:
    for line in fh:
        col = cols.match(line.strip('\n'))
        # if there's no match, skip the line
        if not col:
            continue
        data.append([x if x is not None else '' for x in col.groups()])


[['023004', '1997/11/14', '15:00', '2.971', ''], 
['023004', '1997/11/14', '18:00', '3.175', ''], 
['023004', '1997/11/14', '21:00', '3.300', ''], 
['023004', '1997/11/15', '00:00', '', 'AR'], 
['023004', '1997/11/15', '03:00', '', 'AR']]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM