简体   繁体   中英

Using Pandas Dataframe to perform comparison

I have a.csv file that has a bunch of words with ratings between 0 and 10. I import it using pd.read_cvs, which apparently works (see screen capture). Then I want to import a txt file into python and then look to see if there are common words between this txt file and the words in the.csv file. If so I want the rating to be saved in a np.array, if not look for the next word.

.csv 文件,包含单词及其评分 Here is my code:

dataset = pd.read_csv(r'Path...\AC_sample.csv', sep = '\s+' )
conc_score = np.array([])
p =  "Path../*.txt"

for t in glob(p):
   with open(t , encoding='utf-8') as f:
      text = f.read()
      for ind_row, content_row in dataset.iterrows():
          for i in text:
              if i == content_row:
                  conc_score = np.append(conc_score, dataset.RATING[ind_row])

The error message:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-137-0de8bf8708e7> in <module>
      6         for ind_row, content_row in dataset.iterrows():
      7             for i in text:
----> 8                 if i == content_row:
      9                     conc_score = np.append(conc_score, dataset.RATING[ind_row])
     10 

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
   1477     def __nonzero__(self):
   1478         raise ValueError(
-> 1479             f"The truth value of a {type(self).__name__} is ambiguous. "
   1480             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
   1481         )

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

you have an error in the code:

for ind_row, content_row in dataset.iterrows():

ind_row will give you the index and content_row will give you the row. If you like to compare the content of the text file row by row, you can iterate trough the text file and use the following code for comparison:

for ind_row, content_row in dataset.iterrows():
    for i in text:
        if i in content_row.values:
            conc_score = np.append(conc_score, dataset.RATING[ind_row])

chears

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM