简体   繁体   中英

Create for-loop to loop over list - Python

This is probably an easy one for most of you pros. I have miraculously managed to finetune an ELECTRA model on some data and got some decent f-scores, and now I wish to apply my model to some other text; it's a multi label classification model.

This is how I do with just one sentence:

test_comment = ["We’re exceptionally proud of the 62,000 employees who work in our restaurants, along with the hundreds of Russian suppliers who support our business, and our local franchisees. "]


# tokenizing comment ^
encoding = tokenizer.encode_plus(
  test_comment,
  add_special_tokens=True,
  max_length=512,
  return_token_type_ids=False,
  padding="max_length",
  return_attention_mask=True,
  return_tensors='pt',
)

# returning probability values for each label
_, test_prediction = trained_model(encoding["input_ids"], encoding["attention_mask"])
test_prediction = test_prediction.flatten().numpy()

for label, prediction in zip(LABEL_COLUMNS, test_prediction):
  print(f"{label}: {prediction}",)

#[0 if x <= 0.5 else 1 for x in test_prediction]

Which returns

morality_binary: 0.12542158365249634
emotion_binary: 0.16170987486839294
positive_binary: 0.13724404573440552
negative_binary: 0.06993409991264343
care_binary: 0.06901352107524872
fairness_binary: 0.0649697408080101
authority_binary: 0.05470539629459381
sanctity_binary: 0.03908411040902138
harm_binary: 0.05327978357672691
injustice_binary: 0.057351987808942795
betrayal_binary: 0.03698693960905075
subversion_binary: 0.05460885167121887
degradation_binary: 0.04987286403775215

Now, say I have a dataset with a structure such as

ID      sample_text
1       lorem ipsum dala dulu
2       lorem ipsum dala dulu etc
3       lorem ipsum dala dulu etc
4       lorem ipsum dala dulu etc
5       lorem ipsum dala dulu etc

And I wanted the model to make a prediction for each row and add each prediction as a new column, something like

ID      sample_text                 morality_binary    positive_binary   negative_binary 
1       lorem ipsum dala dulu       0.13455            0.43455           0.26455
2       lorem ipsum dala dulu etc   0.12145            0.43455           0.87455
3       lorem ipsum dala dulu etc   0.03455            0.63455           0.37455
4       lorem ipsum dala dulu etc   0.41455            0.83455           0.81455
5       lorem ipsum dala dulu etc   0.73455            0.93455           0.5455

I have a feeling that it is not too difficult, I just can't wrap my head around it.

Thanks a million for any help you might provide!

Since you didn't provide a minimal reproducible example I cannot confirm this will work, but theoretically it should, assuming your output is a list-like. I also assume your model is a black-box:

First, wrap your model in a function:

# outputs some list-like result
def run_model(input_data):
    ...

Then, apply the function to each row:

df[LABEL_COLUMNS] = df[['sample_text']].apply(run_model, axis=1, result_type='expand')

However, it's not super clear how your model works or what the expected input is, and whether or not you can operate on multiple inputs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM