简体   繁体   中英

How can I subset a data frame based on a list of unique values in a columns of that same data frame?

I have a simple dataframe that looks like this. I want to be able to select all of the rows where LOC is New York, subset this dataframe and tag it as a variable I can use to append the New York rows to an email I have created using win32 to the Contact person. Then move to Boston and do the same thing, etc. I can not figure out how to extract the LOC rows without explicitly naming them. I want this to be dynamic as the LOC values change.

    Contact          LOC     ...     Add_Move  First Name
0   mike@osjloc1.com     New York     ...          Add         Joe
1   mike@osjloc1.com     New York     ...         Move        Stan
2   mike@osjloc1.com     New York     ...          Add        Rick
3   mike@osjloc1.com     New York     ...          Add        Mike
4   jeff@osjloc2.com       Boston     ...          Add       Sonya
5   jeff@osjloc2.com       Boston     ...         Move        Matt
6   jeff@osjloc2.com       Boston     ...         Move       Randy
7   jeff@osjloc2.com       Boston     ...          Add         Sue
8    dave@osjloc.com  Los Angeles     ...          Add        Jill
9    dave@osjloc.com  Los Angeles     ...         Move       Steve
10   dave@osjloc.com  Los Angeles     ...          Add        Bill

Boolean indexing. You can mask a column in dataframe based on column value https://www.geeksforgeeks.org/boolean-indexing-in-pandas/

Getting all the unique locations in the DataFrame.

locations = set(df.loc[:,"LOC"])

locations will return a set of {"New York","Boston",...}

for location in locations:
    variable = df[df["LOC"]==location]

The for loop will loop through the set of values created. To filter a data based on a column value, we can create a mask based on the operators like ==,!=,...

You can use pandas groupby .

groups = yourdataframe.groupby('LOC')

groups contains the dataframe subsets split according to the 'LOC' column. If you iterate on it, each iteration you have a 2-length tuple. Ad index 0, a string corresponding to the value of 'LOC' , at index 1 a dataframe corresponding to the subset (still a dataframe).

for locname, subset in groups:
    #do whatever you want with the subset

Not sure what you need to do, but for example, to print the list of the emails, you could do:

for locname, subset in groups:
    print(subset['Contact'])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM