I have a very very simple program that parses a csv file that has rows of text records whose columns are separated by a single tab character.
I understand split() by default splits on whitespace so explicitly specifying a whitespace pattern isn't needed, but my question is why won't an explicitly specified pattern for whitespace work? Or is '\\s' or r'\\s' not the right pattern/regex? I searched on stackoverflow and found mentioning of string split() being an older method, which I don't really understand since I am very new to python. Does string split() not support regex?
Here is my code:
#!/usr/bin/env python
import os
import re
import sys
f = open(sys.argv[1])
for line in f:
field = line.split()
field2 = line.split('\s')
print field[1], field2[1]
f.close
I tried doing line.split(r'\\s') and that doesn't work either, but line.split('\\t') works.
Because \\t
really represents a tab character in a string (like \\n
is the new line character, see here a list of valid escape sequences ), but \\s
is a special regular expression character class for white spaces.
str.split
[docs] does not accept regular expressions. If you want to split with regular expressions, you have to use re.split
[docs] .
Demonstration:
>>> import re
>>> str = "This\sis a weird\sstring"
>>> str.split("\s") # treated literally
['This', 'is a weird', 'string']
>>> re.split("\s", str) # regex
['This\\sis', 'a', 'weird\\sstring']
string.split()
takes a string as it's argument, and splits based on that string. That's all. \\t
is a, ASCII tab character, while \\s
is simply \\
+ s
in this case.
For a regex split, you want to import re
and use re.split()
.
The string.split()
function does not take a regular expression parameter. Try re.split()
:
>>> import re
>>> re.split(r"\s+", "a b")
['a', 'b']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.