简体   繁体   中英

How to classify text to each row of dataframe?

I would like to classify text in a dataframe. Using a dictionary I check if the values is in a stemmed text column and then I apply a filter in the same column to assign the category in a new column.
The filter is: if at least 33% of the values are True print 1 , else print 0 .

Note: the keys of the dictionary represent categories.

I check the type of the first row: it is a list, but when I apply other methods it doesn't work. So I applied that only to the first row, but I don't know exactly how to transport to all the other rows.

dictionary = {'cat_1' : ['some', stemming', 'bunch'], 'cat_2' : ['to', 'so'], 'cat_3': ['stemming', 'words', 'many', 'bunch']}
dataframe = {'Articles' : ['article1', 'article2', 'article3', 'article4'], 'Text' : [['some', 'stemming', 'words'], ['to' , 'much', 'stemming', 'words'], ['another', 'bunch', 'of', 'stemming', 'words'], ['so', 'many', 'stemming', 'words']]}
test = dataframe.text[0]
for item in dictionary.values():
    filt = []
    for i in item:
        if i in test:
            filt.append(True)
        else:
            filt.append(False)
    print(filt)
    umbral = len(filt) * 0.33
    Trues = filt.count(True)
    if Trues > umbral:
        print('1')
    else:
        print('0')

The output is:

[True, True, False]
1 
[True, False] 
1 
[True, True, False, True] 
1 

I would like to apply that to each row of the column 'text' and have a column only for each result with 1 or/and 0 . For example: in the first row it would be:

|----------|-------|-------|-------|
| Articles | cat_1 | cat_2 | cat_3 |
|----------|-------|-------|-------|
| article1 |   1   |   1   |   0   |
|----------|-------|-------|-------|
| article2 |   0   |   1   |   1   |
|----------|-------|-------|-------|
| article3 |   1   |   0   |   0   |
|----------|-------|-------|-------|

Can you not use:

def cat(z):
    return [True if z[i] in d.values() else False for i in range(0,len(z))]

dataframe['test'].map(lambda x: cat(x))

where df represents your dataframe.text

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM