如何在 Kaggle 中使用 TPU 加速使用 FinBert 的情感分析

Question

我正在嘗試使用 FinBert 分析收益電話會議的情緒。 由於我要分析超過 40,000 次財報電話會議，因此計算情緒分數需要一周多的時間。 因此，我想使用 Kaggle 提供的 TPU 來加速這個過程。

但是我能找到的所有教程/指南都只是處理 model 的訓練，但我只想使用其中一個預訓練版本並使用 TPU 來加速收益電話的情緒分析。

from transformers import BertTokenizer, BertForSequenceClassification
from transformers import pipeline

try:
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver() # TPU detection
except ValueError:
    tpu = None
    gpus = tf.config.experimental.list_logical_devices("GPU")
    
if tpu:
    tf.tpu.experimental.initialize_tpu_system(tpu)
    strategy = tf.distribute.experimental.TPUStrategy(tpu,) 
    print('Running on TPU ', tpu.cluster_spec().as_dict()['worker'])
elif len(gpus) > 1:
    strategy = tf.distribute.MirroredStrategy([gpu.name for gpu in gpus])
    print('Running on multiple GPUs ', [gpu.name for gpu in gpus])
elif len(gpus) == 1:
    strategy = tf.distribute.get_strategy() 
    print('Running on single GPU ', gpus[0].name)
else:
    strategy = tf.distribute.get_strategy() 
    print('Running on CPU')
print("Number of accelerators: ", strategy.num_replicas_in_sync)

finbert = BertForSequenceClassification.from_pretrained('yiyanghkust/finbert-tone',num_labels=3)
tokenizer = BertTokenizer.from_pretrained('yiyanghkust/finbert-tone')

nlp = pipeline("sentiment-analysis", model=finbert, tokenizer=tokenizer)```

#這是我用來計算和評分的循環

for i in range(40001,len(clean_data)-1):
#for i in range(0,10):
    print(i)
    # Get QandA Text
    temp = test_data.iloc[i,3]
    sentences = nltk.sent_tokenize(temp)
    results = nlp(sentences)
    filename = clean_data.iloc[i,0]
    
#     positive = 0 
#     neutral = 0 
#     negative = 0 
    j = 0
    positive = 0
    neutral = 0 
    negative = 0 
    for j in range (0,len(results)):
        label = results[j]["label"]
        if label == "Positive":
            positive = positive + 1
        elif label == "Neutral": 
            neutral = neutral + 1 
        else:
            negative = negative + 1  

            
    per_pos_qanda = positive / len(results)
    per_neg_qanda = negative / len(results)
    net_score_qanda = per_pos_qanda - per_neg_qanda
    
    finbert_results.iloc[i,0] = filename
    finbert_results.iloc[i,7] = per_pos_qanda
    finbert_results.iloc[i,8] = per_neg_qanda
    finbert_results.iloc[i,9] = net_score_qanda

我現在是否需要在調用算法時將 TPU 合並到 for 循環代碼中？ 那么，在這一行？

results = nlp(sentences)

Answer 1

如果我理解正確的話，你指的是推理而不是訓練。 在這種情況下，您也可以從 TPU 中受益，例如通過使用分布式策略。 請參考本指南https://www.tensorflow.org/api_docs/python/tf/distribute/Strategy

如何在 Kaggle 中使用 TPU 加速使用 FinBert 的情感分析

問題描述

1 個解決方案

解決方案1
0 2023-01-27 19:27:01

如何在 Kaggle 中使用 TPU 加速使用 FinBert 的情感分析

問題描述

1 個解決方案

解決方案1 0 2023-01-27 19:27:01

解決方案1
0 2023-01-27 19:27:01