why doesn't line.split('\s') do the same as line.split()?

Question

I have a very very simple program that parses a csv file that has rows of text records whose columns are separated by a single tab character.

I understand split() by default splits on whitespace so explicitly specifying a whitespace pattern isn't needed, but my question is why won't an explicitly specified pattern for whitespace work? Or is '\\s' or r'\\s' not the right pattern/regex? I searched on stackoverflow and found mentioning of string split() being an older method, which I don't really understand since I am very new to python. Does string split() not support regex?

Here is my code:

#!/usr/bin/env python
import os
import re
import sys

f = open(sys.argv[1])
for line in f:
    field = line.split()
    field2 = line.split('\s')
    print field[1], field2[1]
f.close

I tried doing line.split(r'\\s') and that doesn't work either, but line.split('\\t') works.

Answer 1

Because \\t really represents a tab character in a string (like \\n is the new line character, see here a list of valid escape sequences ), but \\s is a special regular expression character class for white spaces.

str.split ^[docs] does not accept regular expressions. If you want to split with regular expressions, you have to use re.split ^[docs] .

Demonstration:

>>> import re
>>> str = "This\sis a weird\sstring"
>>> str.split("\s")                    # treated literally
['This', 'is a weird', 'string'] 
>>> re.split("\s", str)                # regex
['This\\sis', 'a', 'weird\\sstring']

Answer 2

string.split() takes a string as it's argument, and splits based on that string. That's all. \\t is a, ASCII tab character, while \\s is simply \\ + s in this case.

For a regex split, you want to import re and use re.split() .

Answer 3

The string.split() function does not take a regular expression parameter. Try re.split() :

>>> import re
>>> re.split(r"\s+", "a  b")
['a', 'b']

why doesn't line.split('\s') do the same as line.split()?

Question

3 answers

solution1
8 ACCPTED 2011-03-03 19:51:04

solution2
1 2011-03-03 19:51:48

solution3
1 2011-03-03 19:52:03

why doesn't line.split('\s') do the same as line.split()?

Question

3 answers

solution1 8 ACCPTED 2011-03-03 19:51:04

solution2 1 2011-03-03 19:51:48

solution3 1 2011-03-03 19:52:03

solution1
8 ACCPTED 2011-03-03 19:51:04

solution2
1 2011-03-03 19:51:48

solution3
1 2011-03-03 19:52:03