简体   繁体   中英

Nested loop throuh a datafram and compare it's string column values with a list of string tuples and create a new column conditionally?

I'm using Python 3.9.12 to do some data manipulation. I have a dataframe, 'df' that looks like this:

      name1      name2           region_name
0     Salo       saloo           UPPER SINDAN
1     rnelih     NaN             RIMAN TRIST
2     benini     NaN             Lower Tangi
...  ...         ...             ...
999   kremith    kremithin       MIKLAR
1000  Riro       rirron          LOWER BASTI

And a list of tuples, 'ls' that is a retunred result by cursor.fetchall() from db and it looks like this:

[(47, 'Upper Sindan', ''),
 (48, 'Riman Tah', 'Riman Trist'),
 (27, 'Timbari', 'Timbarlin),
 (768, 'MIKLAR', ''),
 (769, 'Dindan', ''),
 (770, 'Shina Hardi',''),
 (...,  '.........','...'),
 (921, 'KIMAN DARIB', 'lower basti')]

My goal is to loop through the 'df' and compare every single value of column 'region_name' with the second and third value of the tuple in the entire list, 'ls'. If they are equal (case insensitive) then fetch the corresponding number from the first value of the tuple in the list and create a new colmun'region_code' in df with fetch value that corrosponds to the matched region name. For non matched put empy value and if it could not find any match in the entire list then don't create the column 'region_code'.

I have tried with below blocks of code but all of them failed:

    for item in ls:
        if df['region_name'].str.contains(item[1]).any() or 
     df['region_name'].str.contains(item[2]).any():
            df['region_code']=item[0]
        else:
            df['region_code']=""

#Here I get empty region_code in the df

for index, row1 in df.iterrows():
    for item in ls:
        if (row1[2] == item[1]) or (row1[2] == item[2]):
            df["region_code"] = item[0]
        else:
            df["region_code"] = ""

#Here I get empty region_code in the df

And for this one:

for index, row1 in df.iterrows():
    for item in ls:
        if (row1[2].str.lower() == item[1].str.lower()) or (row1[2].str.lower() == item[1].str.lower()):
            df["region_code"] = item [0]
        else:
            df["region_code"] = ""

#Here I get: AttributeError: 'str' object has no attribute 'str'

I really appreciate if someone can point out what I'm doing worng or a better way to do this? Thank you for your help!

Use:

L = [(47, 'Upper Sindan', ''),
 (48, 'Riman Tah', 'Riman Trist'),
 (27, 'Timbari', 'Timbarlin'),
 (768, 'MIKLAR', ''),
 (769, 'Dindan', ''),
 (770, 'Shina Hardi',''),
 (921, 'KIMAN DARIB', 'lower basti')]

d = pd.DataFrame(L).melt(0).set_index('value')[0].rename(str.lower).to_dict()
print (d)
{'upper sindan': 47, 'riman tah': 48, 'timbari': 27, 'miklar': 768, 'dindan': 769, 
 'shina hardi': 770, 'kiman darib': 921, '': 770, 'riman trist': 48,
 'timbarlin': 27, 'lower basti': 921}

And then:

df['region_code'] = df['region_name'].str.lower().map(d).fillna('')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM