简体   繁体   中英

Python Replacing white spaces in a text file with a comma

I'm fairly rusty with my python and wanted to know if there was a better way or more efficient way of writing this script.

The scripts purpose it to take a txt log and replace ' ' with replace with ',' to create a .csv.. Makes the logs a little easier to read.

Any suggestions or advice would be appreciated. Thanks.

import sys
import os
import datetime

t = datetime.datetime.now() ## time set to UTC zone
ts = t.strftime("%Y_%m_%d_ %H_%M_%S") # Date format


if len(sys.argv) != 2: # if CLi does not equal to 2 commands print
print ("usage:progammename.py logname.ext")

sys.exit(1)

logSys = sys.argv[1] 
newLogSys = sys.argv[1] + "_" + ts +".csv"

log = open(logSys,"r")
nL = file(newLogSys  ,"w") 


# Read from file log and write to nLog file
for lineI in log.readlines():
    rec=lineI.rstrip()
    if rec.startswith("#"):
        lineI=rec.replace(':',',').strip() 
        nL.write(lineI + "\n")
    else:
        lineO=rec.replace(' ',',').strip() #
        nL.write(lineO + "\n") 

## closes both open files; End script
nL.close()
log.close()

=====Sample log========
#Date: 2008-04-18 15:41:16
#Fields: date time time-taken c-ip cs-username cs-auth-group x-exception-id sc-filter-result cs-categories cs(Referer) sc-status s-action cs-method rs(Content-Type) cs-uri-scheme cs-host cs-uri-port cs-uri-path cs-uri-query cs-uri-extension cs(User-Agent) s-ip sc-bytes cs-bytes x-virus-id
2012-02-02 16:19:01 14 xxx.xxx.xxx.xxx user domain\group dns_unresolved_hostname DENIED "Games" -  404 TCP_ERR_MISS POST - http updaterservice.wildtangent.com 80 /appupdate/appcheckin.wss - wss "Mozilla/4.0 (compatible; MSIE 8.0; Win32)" xxx.xxx.xxx.xxx 824 697 -
  1. Don't use readlines to iterate. Just for lineI in log will iterate over all lines, but without reading the entire file into memory.
  2. You're using rstrip to take the newline off the lines, but then add it back in.
  3. The purpose of the strip on the lines is unclear, esp. when you've already replaced all spaces by commas.

I would shorten your code to:

import sys
import os
from time import strftime

if len(sys.argv) != 2: # if CLi does not equal to 2 commands print
    print ("usage:progammename.py logname.ext")
    sys.exit(1)

logSys    = sys.argv[1]
newLogSys = "%s_%s.csv" % (logSys,strftime("%Y_%m_%d_ %H_%M_%S"))

with open(logSys,'rb') as log, open(newLogSys,'wb') as nL:
    nL.writelines(lineI.replace(':' if lineI[0]=='#' else ' ', ',')
                  for lineI in log)

edit

I still don't understand what you mean by the addition of another line, that is to say '\\n', to lines other than those that begin with '#'

I ran the following code with your sample and I didn't observe something looking like what you describe. Sorry, but I can't propose any solution for a problem I don't perceive .

from time import strftime
import re

ss = ('--||  ||:|||:||--||| \r\n'
      '#10 23:30 abcdef : \r\n'
      '802 12:25 xyz  :  \r\n'
      '\r\n'
      '#:35 11:18+14:39 sunny vale : sunny sea\r\n'
      '  651454451 drh:hdb 54:1\r\n'
      '    \r\n'
      ': 541514 oi:npvert654165:8\r\n'
      '#5415:v541564zervt\r\n'
      '#     ::    \r\n'
      '#::: :::\r\n'
      ' E\r\n')

regx = re.compile('(\r?\n(?!$))|(\r?\n$)')

def smartdispl(com,smth,regx = regx):
    print '\n%s\n%s\n%s' %\
          ('{0:{fill}{align}70}'.format(' %s ' % com,fill='=',align='^'),
           '\n'.join(repr(el) for el in smth.splitlines(1)),
           '{0:{fill}{align}70}'.format('',fill='=',align='^'))

logSys = 'poiu.txt'

with open(logSys,'wb') as f:
    f.write(ss)

with open(logSys,'rb') as f:
    smartdispl('content of the file '+logSys,f.read())

newLogSys = "%s_%s.csv" % (logSys,strftime("%Y_%m_%d_ %H_%M_%S"))

with open(logSys,'rb') as log, open(newLogSys,'wb') as nL:
    nL.writelines(lineI.replace(':' if lineI[0]=='#' else ' ', ',')
                  for lineI in log)

with open(newLogSys,'rb') as f:
    smartdispl('content of the file '+newLogSys,f.read())

result

==================== content of the file poiu.txt ====================
'--||  ||:|||:||--||| \r\n'
'#10 23:30 abcdef : \r\n'
'802 12:25 xyz  :  \r\n'
'\r\n'
'#:35 11:18+14:39 sunny vale : sunny sea\r\n'
'  651454451 drh:hdb 54:1\r\n'
'    \r\n'
': 541514 oi:npvert654165:8\r\n'
'#5415:v541564zervt\r\n'
'#     ::    \r\n'
'#::: :::\r\n'
' E\r\n'
======================================================================

======= content of the file poiu.txt_2012_02_07_ 00_48_55.csv ========
'--||,,||:|||:||--|||,\r\n'
'#10 23,30 abcdef , \r\n'
'802,12:25,xyz,,:,,\r\n'
'\r\n'
'#,35 11,18+14,39 sunny vale , sunny sea\r\n'
',,651454451,drh:hdb,54:1\r\n'
',,,,\r\n'
':,541514,oi:npvert654165:8\r\n'
'#5415,v541564zervt\r\n'
'#     ,,    \r\n'
'#,,, ,,,\r\n'
',E\r\n'
======================================================================

Using the suggestions from @larsmans and removing code duplication from the write section:

# Read from file log and write to nLog file
for line in log:
    if line.startswith("#"): 
        line = line.replace(':',',')
    else: 
        line = line.replace(' ',',')
    nL.write(line) 

If you want succintness, try this version:

    for line in log:
        if line[0] == '#': line = ','.join(line.split(':'))
        else: line = ','.join(line.split())
        nL.write(line + '\n')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM