簡體   English   中英

如何在 Kaggle 中使用 TPU 加速使用 FinBert 的情感分析

[英]How to use TPU to accelerate sentiment analysis using FinBert in Kaggle

我正在嘗試使用 FinBert 分析收益電話會議的情緒。 由於我要分析超過 40,000 次財報電話會議,因此計算情緒分數需要一周多的時間。 因此,我想使用 Kaggle 提供的 TPU 來加速這個過程。

但是我能找到的所有教程/指南都只是處理 model 的訓練,但我只想使用其中一個預訓練版本並使用 TPU 來加速收益電話的情緒分析。

from transformers import BertTokenizer, BertForSequenceClassification
from transformers import pipeline
try:
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver() # TPU detection
except ValueError:
    tpu = None
    gpus = tf.config.experimental.list_logical_devices("GPU")
    
if tpu:
    tf.tpu.experimental.initialize_tpu_system(tpu)
    strategy = tf.distribute.experimental.TPUStrategy(tpu,) 
    print('Running on TPU ', tpu.cluster_spec().as_dict()['worker'])
elif len(gpus) > 1:
    strategy = tf.distribute.MirroredStrategy([gpu.name for gpu in gpus])
    print('Running on multiple GPUs ', [gpu.name for gpu in gpus])
elif len(gpus) == 1:
    strategy = tf.distribute.get_strategy() 
    print('Running on single GPU ', gpus[0].name)
else:
    strategy = tf.distribute.get_strategy() 
    print('Running on CPU')
print("Number of accelerators: ", strategy.num_replicas_in_sync)
finbert = BertForSequenceClassification.from_pretrained('yiyanghkust/finbert-tone',num_labels=3)
tokenizer = BertTokenizer.from_pretrained('yiyanghkust/finbert-tone')

nlp = pipeline("sentiment-analysis", model=finbert, tokenizer=tokenizer)```

#這是我用來計算和評分的循環

for i in range(40001,len(clean_data)-1):
#for i in range(0,10):
    print(i)
    # Get QandA Text
    temp = test_data.iloc[i,3]
    sentences = nltk.sent_tokenize(temp)
    results = nlp(sentences)
    filename = clean_data.iloc[i,0]
    
#     positive = 0 
#     neutral = 0 
#     negative = 0 
    j = 0
    positive = 0
    neutral = 0 
    negative = 0 
    for j in range (0,len(results)):
        label = results[j]["label"]
        if label == "Positive":
            positive = positive + 1
        elif label == "Neutral": 
            neutral = neutral + 1 
        else:
            negative = negative + 1  
​
            
    per_pos_qanda = positive / len(results)
    per_neg_qanda = negative / len(results)
    net_score_qanda = per_pos_qanda - per_neg_qanda
    
    finbert_results.iloc[i,0] = filename
    finbert_results.iloc[i,7] = per_pos_qanda
    finbert_results.iloc[i,8] = per_neg_qanda
    finbert_results.iloc[i,9] = net_score_qanda

我現在是否需要在調用算法時將 TPU 合並到 for 循環代碼中? 那么,在這一行?

results = nlp(sentences)

如果我理解正確的話,你指的是推理而不是訓練。 在這種情況下,您也可以從 TPU 中受益,例如通過使用分布式策略。 請參考本指南https://www.tensorflow.org/api_docs/python/tf/distribute/Strategy

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM