简体   繁体   中英

Reading the CSV file and extract the desired amount of data using python

I have extracted the data from a csv file, starting from specific rows and columns using this code:

def csvread(csvpath, filtered_dict):
    rdr = csv.reader(open(csvpath, 'rb'))
    columns = [{key:row[pos[0][1]] for key,pos in filtered_dict.items()} for row in rdr]
    # finally trim to desired row startpoints:
    data = {key:[col[key] for col in columns[pos[0][0]:]] for key,pos in  filtered_dict.items()}
    return zip(*data.values())

filtered_dict seems like this:

{'Date': [(21, 5)], 'Rate': [(21, 4)], 'Item': [(21, 2)]}

but it extracts the data up to the end of csv file, which results in a problem for processing the required data. Like this:

[('Dates', 'Rates', 'Items'),
 ('2013/03/07', '$114', 'Tissot'),
 ('2013/03/07', '$140', 'Adidas'),
 ('2013/03/07', '$344', 'Nike'),
 ('', '', ''),
 ('', '', ''),
 ('','The rate for EVERY item is FIXED', 'No RETURN or EXCHANGE!')]

Now what I want is to HALT the process if the Function finds ALL 3 fields EMPTY. and must result like this:

[('Dates', 'Rates', 'Items'),
 ('2013/03/07', '$114', 'Tissot'),
 ('2013/03/07', '$140', 'Adidas'),
 ('2013/03/07', '$344', 'Nike')]

Thanks in Advance for Help.

You can check if all the elements of a list of strings have zero length by testing if they have zero length when they are all joined together. This seems to give you what you want:

di = [('Dates', 'Rates', 'Items'),
      ('2013/03/07', '$114', 'Tissot'),
      ('2013/03/07', '$140', 'Adidas'),
      ('2013/03/07', '$344', 'Nike'),
      ('', '', ''),
      ('', '', ''),
      ('','The rate for EVERY item is FIXED', 'No RETURN or EXCHANGE!')]

d2 = []
for x in di:
    if len(''.join(x)) == 0:
        break
    else:
        d2.append(x)

print (d2)

... which outputs:

[('Dates', 'Rates', 'Items'), ('2013/03/07', '$114', 'Tissot'), ('2013/03/07', '
$140', 'Adidas'), ('2013/03/07', '$344', 'Nike')]

A problem with the previously-suggested answer is that the test if len(''.join(x)) == 0: in the most-common case does a lot of work joining up strings, and does a small amount of work only for the terminating case, where all the strings are empty.

It is better to arrange things to do a small amount of work for the most-common case, where the first string of a tuple isn't empty, or the second isn't empty, or the third isn't. This can be tested for with the builtin function any() , which short circuits (quits testing) as soon as it finds a string that isn't empty, so it does a lot less work and is cleaner code to boot.

di = [('Dates', 'Rates', 'Items'),
      ('2013/03/07', '$114', 'Tissot'),
      ('2013/03/07', '$140', 'Adidas'),
      ('2013/03/07', '$344', 'Nike'),
      ('', '', ''),
      ('', '', ''),
      ('','The rate for EVERY item is FIXED', 'No RETURN or EXCHANGE!')]

d2 = []
for x in di:
    if any(x):
        d2.append(x)
    else:
        break

print (d2)

Output:

[('Dates', 'Rates', 'Items'),
 ('2013/03/07', '$114', 'Tissot'),
 ('2013/03/07', '$140', 'Adidas'),
 ('2013/03/07', '$344', 'Nike')]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM