简体   繁体   中英

Comparing Master List to Individual lists in a CSV row

I am trying to automate my classroom, and I'm hitting a wall with comparing my total list of students to a dataframe that has classes and students. Ultimately, the code would return a list of full classes.

First, my total students list is called all_kids .

all_kids=['Kevin', 'Jack', 'Caroline', 'Grace', 'Harry', 'Sam']
df_kids=pd.DataFrame(all_kids)

Then, my class info is in a CSV file, where one column is the class period and one column is the students in that class

Name Kids
English Kevin, Jack, Sam, Richard
Math Caroline, Kevin, Harry, Grace

Is there a way to compare my total list of kids to the kids in each class and return something like this:

Name Kids Status
English Kevin, Jack, Sam, Richard Not Full
Math Caroline, Kevin, Harry, Grace Full.

Here, Math is full because those four kids all appear in all_kids , but English is not full because Richard is not included in all_kids .

Thanks!

You can use str.split with expand=True to split the strings in the Kids column then use .isin + .all to create a boolean mask which is then used with np.where to select the corresponding Status :

m = df_class['Kids'].str.split(r', ', expand=True).isin(all_kids).all(1)
df_class['Status'] = np.where(m, 'Full', 'Not Full')

Alternatively you can split the strings in the Kids column, then inside a list comprehension check for set membership using set.issubset :

m = [set(k.split(', ')).issubset(all_kids) for k in df_class['Kids']]
df_class['Status'] = np.where(m, 'Full', 'Not Full')

      Name                           Kids    Status
0  English      Kevin, Jack, Sam, Richard  Not Full
1     Math  Caroline, Kevin, Harry, Grace      Full

What you need to do is checking if every name present in the 'Kids' variable is present in all_kids. In logics, it means to exclude all_kids from each line of Kids and watch if it is empty or not. Of course, you first need to split you Kids str column to a column of lists:

This code worked for me:


import numpy as np
import pandas as pd 

all_kids=['Kevin', 'Jack', 'Caroline', 'Grace', 'Harry', 'Sam']
df_kids=pd.DataFrame(all_kids)


df = pd.DataFrame(None, columns =  ['Name', 'Kids'])
df.loc[0] = ['English', 'Kevin, Jack, Sam, Richard']
df.loc[1] = ['Math', 'Caroline, Kevin, Harry, Grace']

df['list'] = df['Kids'].apply(lambda s : s.split(', '))
df['diff'] = df['list'].apply(lambda s : [elt for elt in s if elt not in all_kids]).apply(len)
df['Status'] = np.where(df['diff'] == 0, 'Full', 'Not Full')
      Name                           Kids  ... diff    Status
0  English      Kevin, Jack, Sam, Richard  ...    1  Not Full
1     Math  Caroline, Kevin, Harry, Grace  ...    0      Full

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM