简体   繁体   中英

Removing spaces from astring within a CSV in python

I have a CSV that is output by a program. The delimiter is a space. One "cell" of the CSV is manually input by a user, the rest is automatically generated. The issue is that the user may have a space within the string they manually input. If I were to input this into excel it would cause the columns to be off. I'm trying to write a program in Python that will eliminate these spaces within the user input and replace them with an underscore.

So I want to go from this

 600 2 light rain event 2015-01-12 17:48:07

to this

 600 2 gmk_light_rain_event 2015-01-12 17:48:07

Is there any way to code this in python?

使用str类的replace方法

"light rain event".replace(' ', '_')

It would be better if you could replace the spaces closer to when the data is entered. But if you already have collected the data, you need a rule to identify that field amongst the others

>>> s = "600 2 light rain event 2015-01-12 17:48:07"
>>> parts = s.split(" ")

Rule: Leave the first and last 2 fields alone. Replace the " " with "_" in the remainder

>>> parts[:2] + ["_".join(parts[2:-2])] + parts[-2:]
['600', '2', 'light_rain_event', '2015-01-12', '17:48:07']

join the parts of the resulting list

>>> " ".join(parts[:2] + ["_".join(parts[2:-2])] + parts[-2:])
'600 2 light_rain_event 2015-01-12 17:48:07'

And you can add the "gmk" tag like this

>>> " ".join(parts[:2] + ["gmk_"+"_".join(parts[2:-2])] + parts[-2:])
'600 2 gmk_light_rain_event 2015-01-12 17:48:07'

You can use a regex:

>>> import re
>>> s="light rain event"
>>> re.sub(r'\s+', '_', s)
'light_rain_event'
>>> 'gmk_'+re.sub(r'\s+', '_', s)
'gmk_light_rain_event'

You need to split it based on the number of spaces before and after, since I'm guessing it can have any amount of spaces in the middle.

#Line read from CSV
line = "600 2 light rain event 2015-01-12 17:48:07"

#Just incase any parts need changing
spaceBetweenWords = "_"
prefix = "gmk"

#Split by spaces
separatedLine = line.split( " " )

#Get the middle part that needs underscores
startBit = " ".join( separatedLine[:2] )
middleBit = spaceBetweenWords.join( [prefix] + separatedLine[2:-2] )
endBit = " ".join( separatedLine[-2:] )


print "{0} {1} {2}".format( startBit, middleBit, endBit )
# Result: 600 2 gmk_light_rain_event 2015-01-12 17:48:07

I added a bit where you can easily change the underscore and 'gmk' if needed, although looking up I can see John pretty much did it the same way :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM