I have a CSV that is output by a program. The delimiter is a space. One "cell" of the CSV is manually input by a user, the rest is automatically generated. The issue is that the user may have a space within the string they manually input. If I were to input this into excel it would cause the columns to be off. I'm trying to write a program in Python that will eliminate these spaces within the user input and replace them with an underscore.
So I want to go from this
600 2 light rain event 2015-01-12 17:48:07
to this
600 2 gmk_light_rain_event 2015-01-12 17:48:07
Is there any way to code this in python?
使用str类的replace方法
"light rain event".replace(' ', '_')
It would be better if you could replace the spaces closer to when the data is entered. But if you already have collected the data, you need a rule to identify that field amongst the others
>>> s = "600 2 light rain event 2015-01-12 17:48:07"
>>> parts = s.split(" ")
Rule: Leave the first and last 2 fields alone. Replace the " " with "_" in the remainder
>>> parts[:2] + ["_".join(parts[2:-2])] + parts[-2:]
['600', '2', 'light_rain_event', '2015-01-12', '17:48:07']
join the parts of the resulting list
>>> " ".join(parts[:2] + ["_".join(parts[2:-2])] + parts[-2:])
'600 2 light_rain_event 2015-01-12 17:48:07'
And you can add the "gmk" tag like this
>>> " ".join(parts[:2] + ["gmk_"+"_".join(parts[2:-2])] + parts[-2:])
'600 2 gmk_light_rain_event 2015-01-12 17:48:07'
You can use a regex:
>>> import re
>>> s="light rain event"
>>> re.sub(r'\s+', '_', s)
'light_rain_event'
>>> 'gmk_'+re.sub(r'\s+', '_', s)
'gmk_light_rain_event'
You need to split it based on the number of spaces before and after, since I'm guessing it can have any amount of spaces in the middle.
#Line read from CSV
line = "600 2 light rain event 2015-01-12 17:48:07"
#Just incase any parts need changing
spaceBetweenWords = "_"
prefix = "gmk"
#Split by spaces
separatedLine = line.split( " " )
#Get the middle part that needs underscores
startBit = " ".join( separatedLine[:2] )
middleBit = spaceBetweenWords.join( [prefix] + separatedLine[2:-2] )
endBit = " ".join( separatedLine[-2:] )
print "{0} {1} {2}".format( startBit, middleBit, endBit )
# Result: 600 2 gmk_light_rain_event 2015-01-12 17:48:07
I added a bit where you can easily change the underscore and 'gmk' if needed, although looking up I can see John pretty much did it the same way :)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.