简体   繁体   中英

why doesn't line.split('\s') do the same as line.split()?

I have a very very simple program that parses a csv file that has rows of text records whose columns are separated by a single tab character.

I understand split() by default splits on whitespace so explicitly specifying a whitespace pattern isn't needed, but my question is why won't an explicitly specified pattern for whitespace work? Or is '\\s' or r'\\s' not the right pattern/regex? I searched on stackoverflow and found mentioning of string split() being an older method, which I don't really understand since I am very new to python. Does string split() not support regex?

Here is my code:

#!/usr/bin/env python
import os
import re
import sys

f = open(sys.argv[1])
for line in f:
    field = line.split()
    field2 = line.split('\s')
    print field[1], field2[1]
f.close

I tried doing line.split(r'\\s') and that doesn't work either, but line.split('\\t') works.

Because \\t really represents a tab character in a string (like \\n is the new line character, see here a list of valid escape sequences ), but \\s is a special regular expression character class for white spaces.

str.split [docs] does not accept regular expressions. If you want to split with regular expressions, you have to use re.split [docs] .

Demonstration:

>>> import re
>>> str = "This\sis a weird\sstring"
>>> str.split("\s")                    # treated literally
['This', 'is a weird', 'string'] 
>>> re.split("\s", str)                # regex
['This\\sis', 'a', 'weird\\sstring']   

string.split() takes a string as it's argument, and splits based on that string. That's all. \\t is a, ASCII tab character, while \\s is simply \\ + s in this case.

For a regex split, you want to import re and use re.split() .

The string.split() function does not take a regular expression parameter. Try re.split() :

>>> import re
>>> re.split(r"\s+", "a  b")
['a', 'b']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM