简体   繁体   中英

how can i read column from csv

    a=['Business', 'Food/Clothes', 'Fun', 'Politics', 'Starting_with_Apolog', ['NNP', 'MD', 'NN', 'NNP'], ['NNP', 'NN', 'NNP'], ['PDT', 'MD', 'NN', 'NNP'], ['PRP$', 'MD', 'NN', 'NNP'], ['UH', 'MD', 'NN', 'NNP'], ['WP$', 'MD', 'NN', 'NNP'], 'end__with_ly', 'end_with_al', 'end_with_ful', 'end_with_ible', 'end_with_ic', 'end_with_ive', 'end_with_less', 'end_with_ous', 'sorry_word', 'Gender']

    f = open("file.csv")
    reader = csv.reader(f)
    headers = None
    results = []
    for row in reader:
        if not headers:
            headers = []
            for i, col in enumerate(row):
                if col in a:
                    # Store the index of the cols of interest
                    headers.append(i)
            print headers     
        else:
            results.append(list([row[i] for i in headers]))
    return results

The above code is to read specific columns in list a from file.csv so the result will be available in results but the indexing code will only index the following columns:

** Fun 63
** Food/Clothes 64
** Politics 70
** Business 73
** end_with_al 75
** end_with_ful 76
** end_with_ible 77
** end_with_ic 78
** end_with_ive 79
** end_with_less 80
** end__with_ly 81
** end_with_ous 82
** sorry_word 83
** Starting_with_Apolog 84
** Gender 1487

The code does not index the lists inside the list - how can I make the code search them as well? Note: file.csv contains some data with 1487 columns; a contains some columns from file.csv.

Why not just remove the list inside the list?

Example

'Starting_with_Apolog', ['NNP', 'MD', 'NN', 'NNP']

change to:

'Starting_with_Apolog', 'NNP', 'MD', 'NN', 'NNP'

Its a simple hack but it may be the simplest way to go about it.

EDIT

Ok so since you want to leave the list within a list structure I believe you are going to have to give up some performance. The next easiest way I can think to solve it is listed below:

a=['Business', 'Food/Clothes', 'Fun', 'Politics', 'Starting_with_Apolog', ['NNP', 'MD', 'NN', 'NNP'], ['NNP', 'NN', 'NNP'], ['PDT', 'MD', 'NN', 'NNP'], ['PRP$', 'MD', 'NN', 'NNP'], ['UH', 'MD', 'NN', 'NNP'], ['WP$', 'MD', 'NN', 'NNP'], 'end__with_ly', 'end_with_al', 'end_with_ful', 'end_with_ible', 'end_with_ic', 'end_with_ive', 'end_with_less', 'end_with_ous', 'sorry_word', 'Gender']
newa = []    
for element in a:
    if isinstance(element, list):
        for el in element:
            newa.append(el)
    else:
        newa.append(element)
a = newa
# Now use "a" or "newa" in the rest of your code.

Otherwise your if col in a: check is going to get a whole lot more complicated...

Hope this helps!

Your problem is that in doesn't automatically test for inclusion in the sublists in a .

>>> 'Fun' in a
    True
>>> 'NNP' in a
    False

but

>>> 'NNP' in a[5] #a[5] is the list ['NNP', 'MD', 'NN', 'NNP']
    True

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM