简体   繁体   English

在 Pandas df 上运行 lower_case.translate 和 string.punctuation 时出现 AttributeError

[英]AttributeError running lower_case.translate & string.punctuation on Pandas df

I get AttributeError while running lower_case.translate & string.punctuation on Pandas dataframe containing reviews.在包含评论的 Pandas 数据帧上运行 lower_case.translate 和 string.punctuation 时出现 AttributeError。 The imported data is ugly.导入的数据很难看。 The error received is AttributeError: 'DataFrame' object has no attribute 'translate' the full error is below.收到的错误是AttributeError: 'DataFrame' object has no attribute 'translate'完整错误如下。

I tried the different verision in the comments我在评论中尝试了不同的版本

# cleaned_text = lower_case.translate(str.maketrans(string.punctuation, ' '*len(string.punctuation)))
# cleaned_text = lower_case.translator = str.maketrans('', '', string.punctuation)

cleaned_text = lower_case.translate(str.maketrans('', '', string.punctuation))

I also tried this SO post and added a fillna the below above hoping to fix it.我也试过这个SO post并在下面添加了一个fillna,希望能解决它。

#checking for nulls if present any
print("Number of rows with null values:")
print(lower_case.isnull().sum().sum())

lower_case.fillna("")

a [small sample excel][2] for data frame https://github.com/taylorjohn/Simple_RecSys/blob/master/sample-data.xlsx数据框的[small sample excel][2] https://github.com/taylorjohn/Simple_RecSys/blob/master/sample-data.xlsx

code代码

import string
from collections import Counter
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from nltk.corpus import stopwords
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize

# data is in excel formatted ugly and unclean  columns are Artist Names rows are reviews for said Artist
df = pd.read_excel('sample-data.xlsx',encoding='utf8', errors='ignore')

lower_case = df.apply(lambda x: x.astype(str).str.lower())

#checking for nulls if present any
print("Number of rows with null values:")
print(lower_case.isnull().sum().sum())

lower_case.fillna("")


#cleaned_text = lower_case.translate(str.maketrans(string.punctuation, ' '*len(string.punctuation)))
# cleaned_text = lower_case.translator = str.maketrans('', '', string.punctuation)

cleaned_text = lower_case.translate(str.maketrans('', '', string.punctuation))

The Error received is收到的错误是

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-78-9f23b8a5e8e0> in <module>
      2 # cleaned_text = lower_case.translator = str.maketrans('', '', string.punctuation)
      3 
----> 4 cleaned_text = lower_case.translate(str.maketrans('', '', string.punctuation))

~\anaconda3\envs\nlp_course\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   5272             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5273                 return self[name]
-> 5274             return object.__getattribute__(self, name)
   5275 
   5276     def __setattr__(self, name: str, value) -> None:

AttributeError: 'DataFrame' object has no attribute 'translate'

Pandas DataFrames don't have a .translate() method—but, Python strings do. Pandas DataFrames 没有.translate()方法——但是 Python 字符串有。 For example:例如:

import string

my_str = "hello world!"                                                                                                                                                                            
my_str.translate(str.maketrans('', '', string.punctuation)) 

If you want to apply that translation to each column value in the row of the DataFrame, you can use .map() on the column.如果您想将该转换应用于 DataFrame 行中的每个列值,您可以在该列上使用.map() The .map() method takes a function that accepts the column value as an argument, and you can return the transformed value: .map()方法接受一个接受列值作为参数的函数,您可以返回转换后的值:

def remove_punctuation(value):
    return value.translate(str.maketrans('', '', string.punctuation))

df["my_cleaned_column"] = df["my_dirty_column"].map(remove_punctuation)

You can also use a lambda function, rather than defining a new function:您还可以使用 lambda 函数,而不是定义一个新函数:

df["my_cleaned_column"] = df["my_dirty_column"].map(
    lambda x: x.translate(str.maketrans('', '', string.punctuation))
)

If you have many columns you need to apply this to, you can do this:如果您有很多列需​​要应用它,您可以这样做:

for column_name in df.columns:
    df[column_name] = df[column_name].map(
        lambda x: x.translate(str.maketrans('', '', string.punctuation))
    )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM