简体   繁体   中英

applying multiple if and elif statements to substrings in a list of strings in a for loop

I have a spreadsheet filled with disorganized open text fields in column (C1:C3159) that I want to sort by various key-words within the text. I am trying to write a bit of python code that loops through the column, looks for key words, and appends the category of the string in that cell to an empty list depending on what words are found in the text. So far my code looks like this.

## make an object attr for the column    
attr = ['C1:C3159']
## make all lower case
[x.lower() for x in attr]
## initialize an empty list
categories = []
## loop through attr object and append categories to the "categories" list
for i in attr:
    if 'pest' or 'weed' or 'disease' or 'cide' or 'incid' or 'trap'/
    or 'virus' or 'IPM' or 'blight' or 'incid' or 'rot' or 'suck' in i:
        categories.append("pest management")

    elif 'fert' or 'dap' or 'urea' or 'manga' or 'npk' pr 'inm' in i:
        categories.append("fertilizer")

    elif 'wind' or 'rain' or 'irr' or 'alt' or 'moist' or 'soil' or 'ph'\
    or 'drip'or 'environ' or 'ec' in i:
        categories.append("environment")

    elif 'spac' or 'name' or 'stor' or 'yield' or 'rogu' or 'maint'\
    or 'cond' or 'prod' or 'fenc' or 'child' or 'row' or 'prun' or 'hoe'\
    or 'weight' or 'prep' or 'plot' or 'pull' or 'topp' in i:
        categories.append("operations")

    elif 'plant' or 'germin' or 'age' or 'bulk' or 'buds'  or 'matur'\
    or 'harvest' or 'surviv' or 'health' or 'height' or 'grow' in i:
        categories.append("life cycle")

    elif 'price' or 'sold' or 'inr' or 'cost' in i:
        categories.append("market")

    elif 'shed' or 'post' or 'fenc' or 'pond' or 'stor' in i:
        categories.append("PPE")

    else:
        categories.append("uncategorized")

The problem I am having is that after the first if statement the elif statements are not being evaluated in the loop and the list I get returned only contains the few things categorized as "pest management." Does anyone have any idea how to do what I am attempting to do here so that the full loop gets evaluated? A small sample of the strings in the list is posted below.

attr = ['Age of plantation',
'Altitude of Plantation',
'Annual production Last year (In Kg)',
'Average Price paid per kg in NPR (Last Year)',
'Majority Bush type',
'Pruning Cycle',
'Tea sold to ( Last Year)',
'Boll weight in grams',
'CLCuV incidence %',
'Dibbles per row',
'Gap Filling',
'Germination %',
'Hoeing',
'Land preparation',
'Land preparation date',
'Pest & disease incidence',
'Plot size in metre Square',
'Rows per entry',
'Spacing between plants in cms']

Modification

you have to check using in for all string in the if case

if 'pest' in i or 'weed' in i or 'disease' in i or 'cide' in i or 'incid' in i or 'trap' in i  or 'virus' in i or 'IPM' in i or 'blight' in i or 'incid' in i or 'rot' in i or 'suck' in i:

Every time the in your program the first if statement is true due to if 'pest' or

In python

If statement with just "" are used to check if it is a empty string or not .If it is a empty string False is returned else True .Due to this property your if case is matched

if "sad":
    print "Why!"
output: Why!

if "":
    print "Why!"
output:         

if statement the elif statements are not being evaluated

if-elif statements are mutually exclusive. If you want the other if conditions to get evaluated after the first if put each statement in if instead of elif

I would use regex for this.

Lots of people argue that if you solve a problem with regex, you end up with two problems, but I believe that if you do it cleanly, you can avoid this dilema.

import re

pestmanagementattributes = [
    'pest', 'weed', 'disease', 'cide', 'incid', 'trap',
    'virus', 'IPM', 'blight', 'incid', 'rot', 'suck'
]
r_pestmanagement = re.compile(".*" + (".*|.*".join(pestmanagementattributes)) + ".*")

fertilizerattributes = ['fert', 'dap', 'urea', 'manga', 'npk', 'inm']
r_fertilizer = re.compile(".*" + (".*|.*".join(fertilizerattributes)) + ".*")

for i in attr:
    if r_pestmanagement.match(i):
        categories.append("pest management")
    elif r_fertilizer.match(i):
        categories.append("fertilizer")
...
    else:
        categories.append("uncategorized")

This should also be a lot faster to perform, since your string i is only scanned once per category, not once per word.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM