df = {'msg':['i am so happy thank you',
'sticker omitted',
'sticker omitted',
'thank you for your time!'
,'sticker omitted','hello there'],
'number_of_stickers':['2','0','0','1','0','0']} ##This column 'number_of_stickers' is what i am aiming to achieve. Currently, i don't have this column.
df = pd.DataFrame(data=df)
Above is what I am trying to achieve. I currently Do not have the column 'number_of_stickers'. This column would be my end goal.
I am trying to count the number of rows with "sticker omitted" and append the row above the chain of "sticker omitted" with the number of occurrences. I would like to append onto the new column 'number_of_stickers'
To give you some context, I am analysing whatsapp text data, and I thought it would be useful to see how many stickers were sent right after a chat was sent. This also shows the tonality and sentiments of the conversation.
I have posted a solution (credits to @JacoSolari) which would work for the problem I'm solving. Added 1-2 lines (if statement) on top of his codes so that we do not face a problem at the end of the dataframe (range issues).
It's a common technique to check for the other values and take cumsum to identify the blocks:
omitted = df.msg.ne('sticker omitted').cumsum()
df['number_of_stickers'] = np.where(omitted.duplicated(), 0,
omitted.groupby(omitted).transform('size')-1)
You've actually got it all right so far, and your data is substantial for a easy yet functional algorithm!
Here is a little piece of code I coded up for this problem:
#ss
df = {'msg':['i am so happy thank you',
'sticker omitted',
'sticker omitted',
'thank you for your time!'
,'sticker omitted'],
'number_of_stickers':['2','0','0','1','0']}
j = 0
newarr = [] # new array for use
for i in df["number_of_stickers"]:
if(not int(i)==0):
newarr.append([df["msg"][j], int(i)]) # will store each data in a array
#access the number of it by using element 1(newarr[1]) and the msg by newarr[0]
j+=1;
#se
#feel free to do whatever you want after ss to se
pd.DataFrame(data=df)
se being snippet end and ss snippet start.
Hope this helps! Just comment below if it doesn't!
also you have to refeed the new array to the dict.
This code should do the job. I could not find a solution that only uses pandas functions (it might be possible to do it). Anyways, I left some comments in the code to describe my approach.
# create data
df_dict = {'msg':['i am so happy thank you',
'sticker omitted',
'sticker omitted',
'thank you for your time!'
,'sticker omitted']}
df=pd.DataFrame(data=df_dict)
# build column for sticker counts after message
sticker_counts = []
for index, row in df.iterrows(): # iterating over df rows
flag = True
count = 0
# when a sticker row is encountered, just put 0 in the count column
# when a non-sticker row is encountered do the following
if row['msg'] != 'sticker omitted':
k = 1 # to check rows after the non-sticker row
while flag:
# if the index + k row is a sticker increase the count for index and k
if df.loc[index + k].msg == 'sticker omitted':
count += 1
k += 1
# when reached the end of the database, break the loop
if index + k +1 > len(df):
flag = False
else:
flag = False
k = 1
sticker_counts.append(count)
df['sticker_counts'] = sticker_counts
print(df)
I have edited @JacoSolari's codes (with the help of a kind soul) to match the needs of the problem I'm trying to solve. Please find the code below useful.
sticker_counts = []
msg_index = 0
for index, row in df.iterrows(): # iterating over df rows
flag = True
count = 0
# when a sticker row is encountered, just put 0 in the count column
# when a non-sticker row is encountered do the following
if row['msg'] != 'sticker omitted':
k = 1 # to check rows after the non-sticker row
while flag:
print(f'i{msg_index} flag{flag} len{len(df)}')
# if the index + k row is a sticker increase the count for index and k
msg_index=index + k
if msg_index >= len(df):
break
if df.loc[msg_index].msg == 'sticker omitted':
count += 1
k += 1
# when reached the end of the database, break the loop
if msg_index +1 > len(df):
flag = False
print(f'i{msg_index} flag{flag}')
else:
flag = False
k = 1
sticker_counts.append(count)
df['sticker_counts'] = sticker_counts
print(df)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.