简体   繁体   English

以小写形式返回 dataframe 列中的所有单词

[英]Return all words in a dataframe column in lower case

I want to convert all the words in the 'Split Tweets' column to lower case我想将“拆分推文”列中的所有单词转换为小写

This is my code for;这是我的代码;

def word_splitter(df):
    
    df['Split Tweets'] = df['Tweets'].str.split()
    df['Split Tweets'] = df['Split Tweets'].str.lower()

    
    df = df[['Tweets', 'Date', 'Split Tweets']]
    
    return df

word_splitter(twitter_df.copy())

This is the output i get;这是我得到的 output;

    Tweets                                              Date                Split Tweets
0   @BongaDlulane Please send an email to mediades...   2019-11-29 12:50:54 NaN
1   @saucy_mamiie Pls log a call on 0860037566          2019-11-29 12:46:53 NaN
2   @BongaDlulane Query escalated to media desk.        2019-11-29 12:46:10 NaN
3   Before leaving the office this afternoon, head...   2019-11-29 12:33:36 NaN
4   #ESKOMFREESTATE #MEDIASTATEMENT : ESKOM SUSPEN...   2019-11-29 12:17:43 NaN
... ... ... ...
195 Eskom's Visitors Centres’ facilities include i...   2019-11-20 10:29:07 NaN
196 #Eskom connected 400 houses and in the process...   2019-11-20 10:25:20 NaN
197 @ArthurGodbeer Is the power restored as yet?        2019-11-20 10:07:59 NaN
198 @MuthambiPaulina @SABCNewsOnline @IOL @eNCA @e...   2019-11-20 10:07:41 NaN
199 RT @GP_DHS: The @GautengProvince made a commit...   2019-11-20 10:00:09 NaN

This is the expected output;这是预期的 output;

word_splitter(twitter_df.copy()) 
    Tweets                                              Date                Split Tweets
0   @BongaDlulane Please send an email to mediades...   2019-11-29 12:50:54 [@bongadlulane, please, send, an, email, to, m...
1   @saucy_mamiie Pls log a call on 0860037566          2019-11-29 12:46:53 [@saucy_mamiie, pls, log, a, call, on, 0860037...
2   @BongaDlulane Query escalated to media desk.        2019-11-29 12:46:10 [@bongadlulane, query, escalated, to, media, d...
3   Before leaving the office this afternoon, head...   2019-11-29 12:33:36 [before, leaving, the, office, this, afternoon...
4   #ESKOMFREESTATE #MEDIASTATEMENT : ESKOM SUSPEN...   2019-11-29 12:17:43 [#eskomfreestate, #mediastatement, :, eskom, s...
... ... ... ...
195 Eskom's Visitors Centres’ facilities include i...   2019-11-20 10:29:07 [eskom's, visitors, centres’, facilities, incl...
196 #Eskom connected 400 houses and in the process...   2019-11-20 10:25:20 [#eskom, connected, 400, houses, and, in, the,...
197 @ArthurGodbeer Is the power restored as yet?        2019-11-20 10:07:59 [@arthurgodbeer, is, the, power, restored, as,...
198 @MuthambiPaulina @SABCNewsOnline @IOL @eNCA @e...   2019-11-20 10:07:41 [@muthambipaulina, @sabcnewsonline, @iol, @enc...
199 RT @GP_DHS: The @GautengProvince made a commit...   2019-11-20 10:00:09 [rt, @gp_dhs:, the, @gautengprovince, made, a,...

Please how do i do this?请问我该怎么做?

You need to convert the Tweets strings to lowercase before you split them.在拆分它们之前,您需要Tweets字符串转换为小写。 Use this instead:改用这个:

df['Split Tweets'] = df['Tweets'].str.lower().str.split()

After you do str.split() , your df['Split Tweets'] column contains a list and not just a string, so it cannot perform the str.lower() method.在执行str.split()之后,您的df['Split Tweets']列包含一个列表而不仅仅是一个字符串,因此它无法执行str.lower()方法。

Either you change the order, like other answers/comments here suggest, or you can apply the str.lower() method on the list via a lambda function, using the map method:您可以更改顺序,就像此处建议的其他答案/评论一样,或者您可以通过 lambda function 在列表中应用str.lower()方法,使用map方法:

df['Split Tweets'] = df['Split Tweets'].map(lambda x: list(map(str.lower, x)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM