I am trying to automate my classroom, and I'm hitting a wall with comparing my total list of students to a dataframe that has classes and students. Ultimately, the code would return a list of full classes.
First, my total students list is called all_kids
.
all_kids=['Kevin', 'Jack', 'Caroline', 'Grace', 'Harry', 'Sam']
df_kids=pd.DataFrame(all_kids)
Then, my class info is in a CSV file, where one column is the class period and one column is the students in that class
Name | Kids |
---|---|
English | Kevin, Jack, Sam, Richard |
Math | Caroline, Kevin, Harry, Grace |
Is there a way to compare my total list of kids to the kids in each class and return something like this:
Name | Kids | Status |
---|---|---|
English | Kevin, Jack, Sam, Richard | Not Full |
Math | Caroline, Kevin, Harry, Grace | Full. |
Here, Math is full because those four kids all appear in all_kids
, but English is not full because Richard is not included in all_kids
.
Thanks!
You can use str.split
with expand=True
to split the strings in the Kids
column then use .isin
+ .all
to create a boolean mask which is then used with np.where
to select the corresponding Status
:
m = df_class['Kids'].str.split(r', ', expand=True).isin(all_kids).all(1)
df_class['Status'] = np.where(m, 'Full', 'Not Full')
Alternatively you can split
the strings in the Kids
column, then inside a list comprehension check for set
membership using set.issubset
:
m = [set(k.split(', ')).issubset(all_kids) for k in df_class['Kids']]
df_class['Status'] = np.where(m, 'Full', 'Not Full')
Name Kids Status
0 English Kevin, Jack, Sam, Richard Not Full
1 Math Caroline, Kevin, Harry, Grace Full
What you need to do is checking if every name present in the 'Kids' variable is present in all_kids. In logics, it means to exclude all_kids
from each line of Kids
and watch if it is empty or not. Of course, you first need to split you Kids
str column to a column of lists:
This code worked for me:
import numpy as np
import pandas as pd
all_kids=['Kevin', 'Jack', 'Caroline', 'Grace', 'Harry', 'Sam']
df_kids=pd.DataFrame(all_kids)
df = pd.DataFrame(None, columns = ['Name', 'Kids'])
df.loc[0] = ['English', 'Kevin, Jack, Sam, Richard']
df.loc[1] = ['Math', 'Caroline, Kevin, Harry, Grace']
df['list'] = df['Kids'].apply(lambda s : s.split(', '))
df['diff'] = df['list'].apply(lambda s : [elt for elt in s if elt not in all_kids]).apply(len)
df['Status'] = np.where(df['diff'] == 0, 'Full', 'Not Full')
Name Kids ... diff Status
0 English Kevin, Jack, Sam, Richard ... 1 Not Full
1 Math Caroline, Kevin, Harry, Grace ... 0 Full
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.