How to go about separating data in a .csv file?

Question

I have a .csv file that includes a long line of data. The data looks something along the lines of:

Name,Gender,Age John Smith,M,23 Ashley Jones,F,18 James Smith Johns,M,20

My end goal is to separate all of the data so I can put them into rows. My intended result would be:

['Name','Gender','Age','John Smith','M','23','Ashley Jones','F','18','James Smith Jones','M','20']

However, using something like:

line = line.split(",")
line = line.split(" ")

Will not work as it will separate them at the , or space and there will be values like:

'Age John Smith' or 'Age','John','Smith'

Is there any way to work around this?

Answer 1

Split at , first and then iterate over that list and split at each item at whitespaces. If after splitting at whitespaces number of items returned are more than 1 then return the first item and rest of the items separately otherwise simply return the first item.

import csv
def solve(row):
    for item in row:
        spl = item.split(None, 1)
        if len(spl) > 1:
            yield spl[0]
            yield spl[1]           
        else:
            yield spl[0]
...             
with open('abc1') as f:
    reader = csv.reader(f, delimiter=',')
    for row in reader:      
        print list(solve(row))
...         
['Name', 'Gender', 'Age', 'John Smith', 'M', '23', 'Ashley Jones', 'F', '18', 'James Smith Johns', 'M', '20']

Answer 2

Here's a solution using a regular expression:

re.compile("([^,]+),([^,]+),(\d+|Age)\s+").findall("Name,Gender,Age John Smith,M,23 Ashley Jones,F,18 James Smith Johns,M,20")

The result for this will be:

[('Name', 'Gender', 'Age'), ('John Smith', 'M', '23'), ('Ashley Jones', 'F', '18')]

Answer 3

There are nice re -solutions, but I just wanted to add this non-regex solution:

>>> s = "John Smith,M,23 Ashley Jones,F,18 James Smith Johns,M,20"
>>> sum((item.split(None, 1) for item in s.split(',')), list())
['Name', 'Gender', 'Age', 'John Smith', 'M', '23', 'Ashley Jones', 'F', '18', 'James Smith Johns', 'M', '20']

Instead of sum , you can also use itertools.chain . But in the end, it does not seem to be shorter at all.

>>> list(itertools.chain(*[item.split(None, 1) for item in s.split(',')]))

or better

>>> list(itertools.chain.from_iterable(item.split(None, 1) for item in s.split(',')))

Answer 4

A regular expression way. :-)

>>> s = "John Smith,M,23 Ashley Jones,F,18 James Smith Johns,M,20" #Note: no title here.
>>> [(x.group(1), x.group(3), x.group(4)) for x in re.finditer(r"(\S+( \S+)),(\S),(\d+)",s)]
[('John Smith', 'M', '23'), ('Ashley Jones', 'F', '18'), ('Smith Johns', 'M', '20')]

Note that I have removed the title (first line), you'll need to modify the regexp accordingly, or modify the input string.

Answer 5

As I see in example line will be line = line.split(",") enoght. Maybe I didn't get something?

How to go about separating data in a .csv file?

Question

5 answers

solution1
4 ACCPTED 2013-11-25 17:30:01

solution2
3 2013-11-25 17:37:10

solution3
1 2013-11-25 18:04:02

solution4
0 2013-11-25 17:40:47

solution5
-1 2013-11-25 17:26:50

How to go about separating data in a .csv file?

Question

5 answers

solution1 4 ACCPTED 2013-11-25 17:30:01

solution2 3 2013-11-25 17:37:10

solution3 1 2013-11-25 18:04:02

solution4 0 2013-11-25 17:40:47

solution5 -1 2013-11-25 17:26:50

solution1
4 ACCPTED 2013-11-25 17:30:01

solution2
3 2013-11-25 17:37:10

solution3
1 2013-11-25 18:04:02

solution4
0 2013-11-25 17:40:47

solution5
-1 2013-11-25 17:26:50