I have one Dataframe data
groupId service local
0 1 s1 l1
1 1 s1 l1
2 1 s2 l2
3 1 s3 l3
4 2 s2 l2
5 2 s3 l3
6 3 s1 l1
7 3 s2 l2
and I have a Dataframe question
q1 q2 howManyGroups
0 s1 l1 0
1 s1 s2 0
2 s2 l2 0
3 s3 l3 0
4 s3 l1 0
I wanna count the occurrences of question rows based on how many groups in data they appear:
q1 q2 howManyGroups
0 s1 l1 2
1 s1 s2 2
2 s2 l2 3
3 s3 l3 2
4 s3 l1 1
I am using this code, but it is really slow:
for i,g in data.groupby('groupId'):
for j,r in question.iterrows():
if set(r[['q1','q2']].values).issubset(set( g.drop('groupId', axis=1).values.ravel())):
question.loc[j,'howManyGroups'] += 1
Edit: My question dataframe can some times have more/less columns than q1 and q2
. Sometimes it has only q1
, sometimes it has q1, q2, q3
...
What you can do is first reshaping data to get a row per groupId and unique values in any column service or local.
data_ = (data.set_index('groupId').stack()
.reset_index(name='h')
[['groupId', 'h']].drop_duplicates()
)
print (data_.head())
groupId h
0 1 s1
1 1 l1
4 1 s2
5 1 l2
6 1 s3
then use question and merge
twice, the first time only on q1 (and h in data_) to get which groupId are associated with the q1, and the second time on q2 and groupId to ensure that both q1 and q2 are in the same group. Finally, groupby
the original index you kept with reset_index before the merges and use nunique
on groupId:
question['howManyGroups'] = (question[['q1','q2']].reset_index()
.merge(data_, left_on=['q1'], right_on=['h'])
.merge(data_, left_on=['q2','groupId'],
right_on=['h','groupId'])
.groupby('index')['groupId'].nunique()
)
print (question)
q1 q2 howManyGroups
0 s1 l1 2
1 s1 s2 2
2 s2 l2 3
3 s3 l3 2
4 s3 l1 1
If you have a unknown number of qi, you could try something like:
df_tmp = (question.reset_index()
.merge(data_, left_on=['q1'], right_on=['h'])
)
l_q = question.filter(regex='q\d*').columns.tolist()
l_q.remove('q1')
for q in l_q:
df_tmp = df_tmp.merge(data_, left_on=[q,'groupId'], right_on=['h', 'groupId'])
question['howManyGroups'] = df_tmp.groupby('index')['groupId'].nunique()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.