简体   繁体   English

通过函数传递参数以将推文的熊猫数据帧转换为语料库文件时出错

[英]Error when passing argument through function for converting pandas dataframe of tweets into corpus files

I want to prepare my text data that is in a pandas dataframe for sentiment analysis with nltk.我想准备我在 pandas 数据框中的文本数据,以便使用 nltk 进行情绪分析。 For that, I'm using code for a function that converts each row of a pandas dataframe into a corpus.为此,我使用了一个函数的代码,该函数将 pandas 数据帧的每一行转换为语料库。

import nltk
# convert each row of the pandas dataframe of tweets into corpus files
def CreateCorpusFromDataFrame(corpusfolder,df):
    for index, r in df.iterrows():
        date=r['Date']
        tweet=r['Text']
        place=r['Place']
        fname=str(date)+'_'+'.txt'
        corpusfile=open(corpusfolder+'/'+fname,'a')
        corpusfile.write(str(tweet) +" " +str(date))
        corpusfile.close()
CreateCorpusFromDataFrame(myfolder,mydf)

The problem is I keep getting the message that问题是我不断收到这样的信息

NameError: name 'myfolder' is not defined

Even though I have a folder called 'myfolder' in the same path directory of jupyter notebook that my code is in?即使我的代码所在的 jupyter notebook 的同一路径目录中有一个名为“myfolder”的文件夹?

UPDATE:更新:

I can see now that the issue was simply that I needed to pass the folder name as a string.我现在可以看到问题只是我需要将文件夹名称作为字符串传递。 Now that I've done that and amended my code.现在我已经完成了并修改了我的代码。 The problem I have now is that the contents of the text file created with the function are not being written into a corpus and the type of variable being created is a 'NoneType'.我现在遇到的问题是使用该函数创建的文本文件的内容没有被写入语料库,并且正在创建的变量类型是“NoneType”。

import nltk
# convert each row of the pandas dataframe of tweets into corpus files
def CreateCorpusFromDataFrame(corpusfolder,df):
    for index, r in df.iterrows():
        id=r['Date']
        tweet=r['Text']
        #place=r['Place']
        #fname=str(date)+'_'+'.txt'
        fname='tweets'+'.txt'
        corpusfile=open(corpusfolder+'/'+fname,'a')
        corpusfile.write(str(tweet) +" ")
        corpusfile.close()
corpus df = CreateCorpusFromDataFrame('myfolder',mydf)
type(corpusdf)
NoneType

Problem问题

You are passing myfolder as a variable to your function which you have not defined in your code and hence it raises a NameError.您将myfolder作为变量传递给您未在代码中定义的函数,因此会引发 NameError。

Solution解决方案

Just replace it with 'myfolder' [pass it as a string].只需将其替换为'myfolder' [将其作为字符串传递]。

CreateCorpusFromDataFrame('myfolder',mydf)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM