简体   繁体   中英

Split a column of a csv file

As a beginner in Python What I'm trying to achieve sounds very easy but I'm unable to get python to work as desired.

I have a csv file with several headers as such:

Area    Facility
AAA     car, train, bus
BBB     car
CCC     car, bus, tram
DDD     bicycle
EEE     car, bus, train, tram, walk
FFF     train, tram, plane, helicopter

I am trying to split the 'Facility' column into the different words and then run some queries (eg unique facilities). My desired output is train, tram, plane, walk etc as a list from column 2.

I am able to successfully split the csv into the two columns but if I further iterate it breaks it down into single letters.

import csv

fOpen1=open('C:\data.csv')

Facilities=csv.reader(fOpen1)
unique=[]

for row in Facilities:
    for facility in row[1]:
        if row[13] not in unique:
        unique.append(row[13])

I looked around and noticed people using split.lines but had no luck using it either.

Any suggestion/ideas?

Thank you!

Here is the documentation for split

Docstring: S.split(sep=None, maxsplit=-1) -> list of strings

Return a list of the words in S, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done. If sep is not specified or is None, any whitespace string is a separator and empty strings are removed from the result.

Basically if you call split with no argument, it splits on whitespace (the columns in your dataset), you can split on any other character by calling split with that character, eg

print("car, train, bus".split(','))
['car', ' train', ' bus']

As csv file split columns with , , if there is no , between the first column and the second column, the output for each line will be like this:

['Area Facility']

['AAA car', ' train', ' bus']

['BBB car']

['CCC car', ' bus', ' tram']

['DDD bicycle']

['EEE car', ' bus', ' train', ' tram', ' walk']

['FFF train', ' tram', ' plane', ' helicopter']

Thus, you can use split of the the first element of the list to get the first facility. The other facilities is stored in the rest of the list . Your target can be achieved as follows:

import csv

fOpen1=open('C:\data.csv')

Facilities=csv.reader(fOpen1)
unique=[]

for row in Facilities:
    first_facility = row[0].split()[1] # by default, use space to split
    if first_facility not in unique:
        unique.append(first_facility)
    for rest_facility in row[1:]:
    if rest_facility not in unique:
        unique.append(rest_facility)

print unique

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM