简体   繁体   中英

Making a column in a dataframe based of values in other lists

enter image description here I have two data frames. Each value of the 'Zip code' column contains a Zip code that is in either District 2,5, or 7. I want to make a brand new column called 'District' in the codes dataframe that corresponds to which district that zip code belongs too. This for loop doesn't seem to be working. I have attempted to make each of these columns into a list and then use a for loop but this doesn't seem to work since there are more District Codes than actual Zip Codes. It ends up saying ValueError: Length of values does not match length of index

Here is the code.

d2 = d_codes['District 2'].tolist()   
d5 = d_codes['District 5'].tolist() 
d7 = d_codes['District 7'].tolist() 
main_zips = codes['Zip Code'].tolist()

result = [] 
for value in main_zips: 
    if value in d2: 
       result.append("District 2") 
    elif value in d5: 
       result.append("District 5") 
    elif value in d7: 
       result.append("District 7") 
   

codes["Result"] = result

Is there a better way to perform this task?

A small note to start- it's best to give people a fully working example of your problem. Giving some fake data will make it a lot easier for people to help you.

I would try to get your districts into a different structure- a single dataframe, districts, with two columns- zipcode and district. Pandas melt is perfect for this:

import pandas as pd
df = pd.read_csv("fake_data.csv")
print(df.head())
   District 2   District 5   District 7
0       23081        20106        20106
1       23090        20106        20106
2       23185        20106        20106
districts = df.melt()
print(districts)
      variable  value
0   District 2  23081
1   District 2  23090
2   District 2  23185
3   District 5  20106
4   District 5  20106
5   District 5  20106
6   District 7  20106
7   District 7  20106
8   District 7  20106

You can then merge your dataframes based on the zipcode column.

codes = codes.merge(districts, how="left", left_on="zipcode", right_on="zipcode")
   x  zipcode   district
0  1    23081  District2
1  2    23090  District2
2  3    20106  District5
3  3    20106  District5
4  3    20106  District5
5  3    20106  District7
6  3    20106  District7
7  3    20106  District7

There's a couple of problems though, your screenshot shows the same zipcodes appearing in multiple districts, and also, you have duplicate zipcodes. Merge will find all matches, so you'll end up with additional rows after the merge. You should fix the issue that puts the same zipcodes in multiple districts, and then you should deduplicate the zipcode column to ensure there's only one matching district per zipcode. Once that's done, then do the merge.

Feel free to hit me up if you have any issues!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM