简体   繁体   中英

Python - Pandas replace a value in a column if another column has a value that is in a list

I have hundreds of thousands of rows that look something like this (there's actually more data than just this, but I'm trying to simplify the idea I've been attempting)...

index  status       location
0      infected     area5
1      healthy      area6
2      healthy      area3
3      infected     area8
4      healthy      area1
5      healthy      area8
6      healthy      area5
7      healthy      area2
8      healthy      area4
9      healthy      area10
10     ....          ....

I'm trying to update the status column, based on if an area is infected. So I basically made a list of the infected areas:

infected_areas = ['area5', 'area8']

Then what I'm trying to do is look at all the rows (or really just the 'healthy' rows), and if any of those match to what is in my infected_areas list, to change that status to infected.

So with my example above, the output should look like:

index  status       location
0      infected     area5
1      healthy      area6
2      healthy      area3
3      infected     area8
4      healthy      area1
5      infected     area8
6      infected     area5
7      healthy      area2
8      healthy      area4
9      healthy      area10
10     ....          ....

here's what I've been working with, but not quite getting anywhere:

`df[df['location'].isin(location)]['status'] = 'infected'

Just using .loc

df.loc[df.location.isin(infected_areas),'status']='infected'
df
Out[49]: 
   index    status location
0      0  infected    area5
1      1   healthy    area6
2      2   healthy    area3
3      3  infected    area8
4      4   healthy    area1
5      5  infected    area8
6      6  infected    area5
7      7   healthy    area2
8      8   healthy    area4
9      9   healthy   area10

You can use pd.Series.isin in conjunction with pd.Series.where :

infected_areas = ['area5', 'area8']

df.status.where(
    ~df.location.str.strip().isin(infected_areas),
    other='infected',
    inplace=True)
>>> df
    index   status  location
0   0   infected    area5
1   1   healthy area6
2   2   healthy area3
3   3   infected    area8
4   4   healthy area1
5   5   infected    area8
6   6   infected    area5
7   7   healthy area2
8   8   healthy area4
9   9   healthy area10

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM