Comparing Master List to Individual lists in a CSV row

Question

I am trying to automate my classroom, and I'm hitting a wall with comparing my total list of students to a dataframe that has classes and students. Ultimately, the code would return a list of full classes.

First, my total students list is called all_kids .

all_kids=['Kevin', 'Jack', 'Caroline', 'Grace', 'Harry', 'Sam']
df_kids=pd.DataFrame(all_kids)

Then, my class info is in a CSV file, where one column is the class period and one column is the students in that class

Name	Kids
English	Kevin, Jack, Sam, Richard
Math	Caroline, Kevin, Harry, Grace

Is there a way to compare my total list of kids to the kids in each class and return something like this:

Name	Kids	Status
English	Kevin, Jack, Sam, Richard	Not Full
Math	Caroline, Kevin, Harry, Grace	Full.

Here, Math is full because those four kids all appear in all_kids , but English is not full because Richard is not included in all_kids .

Thanks!

Answer 1

You can use str.split with expand=True to split the strings in the Kids column then use .isin + .all to create a boolean mask which is then used with np.where to select the corresponding Status :

m = df_class['Kids'].str.split(r', ', expand=True).isin(all_kids).all(1)
df_class['Status'] = np.where(m, 'Full', 'Not Full')

Alternatively you can split the strings in the Kids column, then inside a list comprehension check for set membership using set.issubset :

m = [set(k.split(', ')).issubset(all_kids) for k in df_class['Kids']]
df_class['Status'] = np.where(m, 'Full', 'Not Full')

      Name                           Kids    Status
0  English      Kevin, Jack, Sam, Richard  Not Full
1     Math  Caroline, Kevin, Harry, Grace      Full

Answer 2

What you need to do is checking if every name present in the 'Kids' variable is present in all_kids. In logics, it means to exclude all_kids from each line of Kids and watch if it is empty or not. Of course, you first need to split you Kids str column to a column of lists:

This code worked for me:


import numpy as np
import pandas as pd 

all_kids=['Kevin', 'Jack', 'Caroline', 'Grace', 'Harry', 'Sam']
df_kids=pd.DataFrame(all_kids)


df = pd.DataFrame(None, columns =  ['Name', 'Kids'])
df.loc[0] = ['English', 'Kevin, Jack, Sam, Richard']
df.loc[1] = ['Math', 'Caroline, Kevin, Harry, Grace']

df['list'] = df['Kids'].apply(lambda s : s.split(', '))
df['diff'] = df['list'].apply(lambda s : [elt for elt in s if elt not in all_kids]).apply(len)
df['Status'] = np.where(df['diff'] == 0, 'Full', 'Not Full')

      Name                           Kids  ... diff    Status
0  English      Kevin, Jack, Sam, Richard  ...    1  Not Full
1     Math  Caroline, Kevin, Harry, Grace  ...    0      Full

Comparing Master List to Individual lists in a CSV row

Question

2 answers

solution1
1 ACCPTED 2020-12-31 15:12:00

solution2
0 2020-12-31 14:50:52

Comparing Master List to Individual lists in a CSV row

Question

2 answers

solution1 1 ACCPTED 2020-12-31 15:12:00

solution2 0 2020-12-31 14:50:52

solution1
1 ACCPTED 2020-12-31 15:12:00

solution2
0 2020-12-31 14:50:52