简体   繁体   English

Using a lambda function for pandas.DataFrame boolean indexing reports TypeError

[英]Using a lambda function for pandas.DataFrame boolean indexing reports TypeError

I have a list of words called sowpods and I need to verify which combination of letters exist either as a word or within a word.我有一个名为 sowpods 的单词列表,我需要验证哪些字母组合作为单词存在或存在单词中。

For example, if my letters are ['r', 't', 'e', 'f'] , one of the possible combinations is 're' which is within 'red' , therefore the word 'red' should be kept.例如,如果我的字母是['r', 't', 'e', 'f'] ,则可能的组合之一是're' ,它在'red'内,因此应该保留单词 'red' .

I already have some code that can figure out all of the possible combinations, but now I want to find how to add all of the words that fit the requirements to a list.我已经有一些代码可以找出所有可能的组合,但现在我想找到如何将所有符合要求的单词添加到列表中。

I have done the following:我做了以下事情:

import pandas as pd

sowpods = pd.read_csv('sowpods.csv', names=['Word'])

possible_combination = 'RE'
possible_words = pd.DataFrame([], columns=['Word'])

comb_in_word = lambda _: True if (possible_combination in _) else False # ------ line 8

sowpods_bool = sowpods['Word'].apply(comb_in_word) # --------------------------- line 10
possible_words.append(sowpods.loc[sowpods_bool, 'Word'])

But then I get:但后来我得到:

 File "c:\tests.py", line 10, in <module>
    sowpods_bool = sowpods['Word'].apply(comb_in_word)
  File "C:\Python38-32\lib\site-packages\pandas\core\series.py", line 3848, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas\_libs\lib.pyx", line 2329, in pandas._libs.lib.map_infer
  File "c:\Users\lenovo\OneDrive\Prog\Projects\Scrabble\tests.py", line 8, in <lambda>
    comb_in_word = lambda _: True if possible_combination in _ else False
TypeError: argument of type 'float' is not iterable

I tested my lambda function in a more controlled environment and it worked fine, so I'm confident that the error's not coming from there.我在一个更受控的环境中测试了我的 lambda function,它运行良好,所以我确信错误不是来自那里。

I don't understand why I get this error when I'm not iterating through anything myself.我不明白为什么我自己没有迭代任何东西时会出现这个错误。 I get that pandas is iterating through the DataFrame's column, but it shouldn't do an error where it's using floats instead of integers.我知道 pandas 正在遍历 DataFrame 的列,但它不应该在使用浮点数而不是整数时出错。

Edit:编辑:

[In]
print(sowpods.head())
[Out]
      Word
0      AA
1     AAH
2   AAHED
3  AAHING
4    AAHS
[In]
print(sowpods.dtypes)
[Out]
Word    object
dtype: object

In the list of words there were 'NA' and 'NULL' , which Pandas represented as NaN s.在单词列表中有'NA''NULL' ,其中 Pandas 表示为NaN s。 I had to specify keep_default_na=False :我必须指定keep_default_na=False

sowpods = pd.read_csv('projects/scrabble/sowpods_en.csv', names=['Word'], keep_default_na=False)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM