I'm using Python 3.9.12 to do some data manipulation. I have a dataframe, 'df' that looks like this:
name1 name2 region_name
0 Salo saloo UPPER SINDAN
1 rnelih NaN RIMAN TRIST
2 benini NaN Lower Tangi
... ... ... ...
999 kremith kremithin MIKLAR
1000 Riro rirron LOWER BASTI
And a list of tuples, 'ls' that is a retunred result by cursor.fetchall() from db and it looks like this:
[(47, 'Upper Sindan', ''),
(48, 'Riman Tah', 'Riman Trist'),
(27, 'Timbari', 'Timbarlin),
(768, 'MIKLAR', ''),
(769, 'Dindan', ''),
(770, 'Shina Hardi',''),
(..., '.........','...'),
(921, 'KIMAN DARIB', 'lower basti')]
My goal is to loop through the 'df' and compare every single value of column 'region_name' with the second and third value of the tuple in the entire list, 'ls'. If they are equal (case insensitive) then fetch the corresponding number from the first value of the tuple in the list and create a new colmun'region_code' in df with fetch value that corrosponds to the matched region name. For non matched put empy value and if it could not find any match in the entire list then don't create the column 'region_code'.
I have tried with below blocks of code but all of them failed:
for item in ls:
if df['region_name'].str.contains(item[1]).any() or
df['region_name'].str.contains(item[2]).any():
df['region_code']=item[0]
else:
df['region_code']=""
#Here I get empty region_code in the df
for index, row1 in df.iterrows():
for item in ls:
if (row1[2] == item[1]) or (row1[2] == item[2]):
df["region_code"] = item[0]
else:
df["region_code"] = ""
#Here I get empty region_code in the df
And for this one:
for index, row1 in df.iterrows():
for item in ls:
if (row1[2].str.lower() == item[1].str.lower()) or (row1[2].str.lower() == item[1].str.lower()):
df["region_code"] = item [0]
else:
df["region_code"] = ""
#Here I get: AttributeError: 'str' object has no attribute 'str'
I really appreciate if someone can point out what I'm doing worng or a better way to do this? Thank you for your help!
Use:
L = [(47, 'Upper Sindan', ''),
(48, 'Riman Tah', 'Riman Trist'),
(27, 'Timbari', 'Timbarlin'),
(768, 'MIKLAR', ''),
(769, 'Dindan', ''),
(770, 'Shina Hardi',''),
(921, 'KIMAN DARIB', 'lower basti')]
d = pd.DataFrame(L).melt(0).set_index('value')[0].rename(str.lower).to_dict()
print (d)
{'upper sindan': 47, 'riman tah': 48, 'timbari': 27, 'miklar': 768, 'dindan': 769,
'shina hardi': 770, 'kiman darib': 921, '': 770, 'riman trist': 48,
'timbarlin': 27, 'lower basti': 921}
And then:
df['region_code'] = df['region_name'].str.lower().map(d).fillna('')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.