簡體   English   中英

Pandas - 如果 str.contains 返回多個值,則創建多個新列

[英]Pandas - Create multiple new columns if str.contains return multiple value

我有一些這樣的數據:

0       Very user friendly interface and has 2FA support
1       The trading page is great though with allot o...
2                                         Widget support
3       But it’s really only for serious traders with...
4       The KYC and AML process is painful - it took ...
                             ...                        
937                                      Legit platform!
938     Horrible customer service won’t get back to m...
939                             App is fast and reliable
940               I wish it had a portfolio chart though
941    The app isn’t as user friendly as it need to b...
Name: reviews, Length: 942, dtype: object

和特點:

 ['support',
 'time',
 'follow',
 'submit',
 'ticket',
 'team',
 'swap',
 'account',
 'experi',
 'contact',
 'user',
 'platform',
 'screen',
 'servic',
 'custom',
 'restrict',
 'fast',
 'portfolio',
 'specialist']

我想檢查評論中的一項或多項功能是否在新列中添加了該詞。

我的代碼是這樣的:

data["words"] = data[data["reviews"].str.contains('|'.join(features))]

但是這段代碼創建了名稱為“words”的新列,但是因為有時代碼返回多個值所以我得到錯誤

ValueError: Columns must be same length as key

怎么解決?

問題是您實際上並沒有提取任何單詞。 你需要從文本中提取你想要的詞,然后將它們分類到一個新的列中。

import pandas as pd
from io import StringIO
import re

TESTDATA = StringIO("""Index,reviews,
0,       Very user friendly interface and has 2FA support,
1,       The trading page is great though with allot o...,
2,                                         Widget support,
3,       But it’s really only for serious traders with...,
4,       The KYC and AML process is painful - it took ...,
937,                                      Legit platform!,
938,     Horrible customer service won’t get back to m...,
939,                             App is fast and reliable,
940,               I wish it had a portfolio chart though,
941,    The app isn’t as user friendly as it need to b...
    """)

data = pd.read_csv(TESTDATA, sep=",").drop('Unnamed: 2',   axis = 1)
data
#>    Index                                            reviews
0      0         Very user friendly interface and has 2F...
1      1         The trading page is great though with a...
2      2                                           Widge...
3      3         But it’s really only for serious trader...
4      4         The KYC and AML process is painful - it...
5    937                                        Legit pl...
6    938       Horrible customer service won’t get back ...
7    939                               App is fast and r...
8    940                 I wish it had a portfolio chart...
9    941      The app isn’t as user friendly as it need ...

data['words'] = list(map(lambda x: ", ".join(x), [re.findall('|'.join(features), x) for x in data.reviews]))
data
#>    Index                                            reviews           words
0      0         Very user friendly interface and has 2F...   user, support
1      1         The trading page is great though with a...                
2      2                                           Widge...         support
3      3         But it’s really only for serious trader...                
4      4         The KYC and AML process is painful - it...                
5    937                                        Legit pl...        platform
6    938       Horrible customer service won’t get back ...  custom, servic
7    939                               App is fast and r...            fast
8    940                 I wish it had a portfolio chart...       portfolio
9    941      The app isn’t as user friendly as it need ...            user

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM