简体   繁体   中英

Extracting hashtags from .txt file python

So I started a TikTok tool for data analysis but I cannot extract hashtags from a saved.txt file. Here there's what I did:

from tiktok_bot import TikTokBot  # TikTok API
import csv
import os 
import sys
import re # attempt to use findall, but it didn't work

try:
     os.mkdir("./data") . # Creating data folder
except OSError as e:
   print("Directory exists")


def getData(): # date in file name
    return datetime.datetime.now().strftime ("%Y-%m-%d")

def buildFileName(type): # building .csv name
    return ("./data/") + getData() + (type) + ".csv"

def buildText(type): # building .txt name
    return ("./data/") + getData() + (type) + ".txt"

with open(buildFileName("_shares"), mode='a') as csv_file:   # writing .csv file
    fieldnames = ['User ID', 'URL', 'Description', 'Comments', 'Likes']
    writer = csv.DictWriter(csv_file, fieldnames=fieldnames)
    writer.writeheader()

    for post in most_shared_posts:
        print(str(post.author_user_id) , str(post.share_url) , str(post.desc) , post.statistics.comment_count , post.statistics.digg_count)
        writer.writerow({'User ID': str(post.author_user_id), 'URL': str(post.share_url), 'Description': str(post.desc), 'Comments': post.statistics.comment_count, 'Likes': post.statistics.digg_count})

with open(buildFileName("_shares"), mode='r') as csv_file:
    csv_reader = csv.DictReader(csv_file, delimiter=',')
    for lines in csv_reader:
      print(lines['Description'])     # save .csv
sys.stdout = open(buildText("_shares"), "w") . # .csv saved into .txt
print (lines['Description'])

What can I do now to extract hashtags from the descriptions printed in the.txt file? Note: Description is made by.txt and hashtags, so basically I think is a string.

You can do


import re
m = re.findall(r'#(\w+)', lines['Description'])
print(m)

I'm not sure I understand your question but am I correct in assuming you want to get the hashtags from the description string? If so you can use re to find all hashtag words in the string.

hashtags = re.findall(r"#\w*", description)

This should return a list for what you're looking for

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM