简体   繁体   中英

Generating FastText multi-label format

I want to apply FastText for my stack over flow tag predictor.

I have my tags as a dataframe:

    df['Tags']

0                 [php]
1             [firefox]
2                   [r]
3                  [c#]
4            [php, api]
              ...      
179994     [php, flash]
179995         [delphi]
179996              [c]
179997        [android]
179998    [java, email]
Name: Tags, Length: 134222, dtype: object

I want to transform each element into a single string __label__XX__label__YY__ , so I tried:

tags=['__label__'.join(s) for s in df['Tags']]

This results in:

['php', 'firefox', 'r', 'c#', 'php__label__api', 'c#__label__asp.net', '.net__label__javascript', 'sql', '.net', 'algorithm', 'windows-7']

But I want my result as

['__label__php', '__label__firefox', '__label__r', '__label__c#', '__label__php__label__api', '__label__c#__label__asp.net', '__label__.net__label__javascript', '__label__sql', '__label__.net', '__label__algorithm', '__label__windows-7']

Try:

tags = ['__label__' + '__label__'.join(s) for s in df['Tags']]

Test:

labels = [['foo'], ['bar', 'baz']]
j = '__label__'
[j + j.join(l) for l in labels]

# out: ['__label__foo', '__label__bar__label__baz']

It's also worth looking at result = df['Tags'].applymap(lambda s: j+j.join(s)) but i did not test that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM