This is how I scraped the data using Beautifulsoup.
comments =[]
users_list = []
users = driver.find_elements_by_class_name('_6lAjh')
for user in users:
users_list.append(user.text)
i = 0
texts_list = []
texts = driver.find_elements_by_class_name('C4VMK')
for txt in texts:
texts_list.append(txt.text.split(users_list[i])[1].replace("\r"," ").replace("\n"," "))
i += 1
comments_count = len(users_list)
for i in range(1, comments_count):
user = users_list[i]
text = texts_list[i]
print("User ",user)
print("Text ",text)
print()
comments.append(users_list[i])
comments.append(texts_list[i])
idxs = [m.start() for m in re.finditer('@', text)]
for idx in idxs:
handle = text[idx:].split(" ")[0]
print(handle)
This is the text data I have which are username, comments, and number of likes from instagram. ' heyyy 3w1 likeReply' -> 'heyyy' is comment in here, 3w means the comment was written 3weeks ago, 1 like is number of likes
print(comments)
['User1',
' 😱 3w1 likeReply',
'User2',
' 💖 3w1 likeReply',
'User3',
' Looking good! Collab, DM "bruteimpact.fashion 3wReply',
'User4',
' heyyy 3w5 likeReply']
I want to save this into CSV file that looks like this(three columns- ID, Comments, likes_count):
ID Comments likes_count
User1 😱 0
User2 💖 1
User3 Looking good! Collab, DM "bruteimpact.fashion 0
User4 heyyy 5
so far this is the code I wrote but is far from the result I want to get and I do not know how to get to the final destination at all. Plus, I have no idea how to make separate 'likes_count' by detaching the number of likes from the comment data I have. However, I would be satisfied with CSV file with just "ID" and "Text" column without "likes_count". Please help me!
fields = ["User", "Text"]
rows = [comments]
filename = "insta_records.csv"
with open(filename, 'w', encoding='utf-8') as csvfile:
csvwriter = csv.writer(csvfile)
csvwriter.writerow(fields)
csvwriter.writerows(rows)
You have flat list so you could use zip
to group user and its comment
comments = ['User1',
' 😱 3w1 likeReply',
'User2',
' 💖 3w1 likeReply',
'User3',
' Looking good! Collab, DM "bruteimpact.fashion 3wReply',
'User4',
' heyyy 3w5 likeReply']
rows = []
for user, text in zip(comments[::2], comments[1::2]):
print(user, text)
#rows.append([user, text])
fields = ["User", "Text"]
filename = "insta_records.csv"
with open(filename, 'w', encoding='utf-8') as csvfile:
csvwriter = csv.writer(csvfile)
csvwriter.writerow(fields)
csvwriter.writerows(rows)
Result on screen
User1 😱 3w1 likeReply
User2 💖 3w1 likeReply
User3 Looking good! Collab, DM "bruteimpact.fashion 3wReply
User4 heyyy 3w5 likeReply
And in file
User,Text
User1, 😱 3w1 likeReply
User2, 💖 3w1 likeReply
User3," Looking good! Collab, DM ""bruteimpact.fashion 3wReply"
User4, heyyy 3w5 likeReply
To create other columns you would have to first edit comments - split()
, replace()
, slice [start:end]
, etc.
rows = []
for user, text in zip(comments[::2], comments[1::2]):
parts = text.rsplit(' ', 2)#[:-1]
parts.insert(0, user)
print(parts)
rows.append(parts)
Result on screen
['User1', ' 😱', '3w1', 'likeReply']
['User2', ' 💖', '3w1', 'likeReply']
['User3', ' Looking good! Collab, DM', '"bruteimpact.fashion', '3wReply']
['User4', ' heyyy', '3w5', 'likeReply']
but there is missing space in '3wReply'
so it doesn't split it correctly and it would need more work to split it correctly.
BTW: when you have 3w5
then you can split('w')
to get ['3', '5']
but in HTML can be other text instead of w
so it would need more work. Maybe using more complex rules in BeautifulSoup
you could better split it.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.