Function 使用可变文本清理数据框列

Question

How can I set a function that can cleans through a given text passed into a Data Frame.我如何设置一个 function 来清除传递给数据框的给定文本。 The text will be my variable, so I can put whatever sentence and the function will clean it by applying lower case, removing characters, etc. My attempt goes like this:文本将是我的变量，所以我可以输入任何句子，function 将通过应用小写、删除字符等来清理它。我的尝试是这样的：

def my_function(x):
    # Applies a few cleaning steps to the exceptions df:
    # Sets text to lower case:
    x.iloc[:, 0].str.lower()
    # Removes breaks:
    x.iloc[:, 0].replace(r'\n', ' ', regex=True)
    # Sets text to lower case:
    x.iloc[:, 0].str.lower()
    # Removes a more extensive set of 'special' characters:
    remove_these = ["!",'"',"%","&","'","(",")","#","*","?",
                    "+",",","-",".","/",":",";","<","=",">",
                    "@","[","\\","]","^","_","`","{","|","}",
                    "~","–","’", "*"]
    for char in remove_these:
        x.iloc[:, 0].str.replace(char, ' ')
    # Removes numbers:
    x.iloc[:, 0].replace(r'\d+', ' ', regex=True)
    # Removes single characters:
    x.iloc[:, 0].replace(r'\b[a-zA-Z]\b', ' ', regex=True)
    # Removes extra spaces (trim) from both ends:
    x.iloc[:, 0].str.strip()
    # Removes double spacing:
    x.iloc[:, 0].replace(r' +', ' ', regex=True)
    # Removes spaces --:
    x.iloc[:, 0].replace(r'--', '', regex=True)

Since the variable text would be passed into a DF, I thought using the first column always, hence the iloc[:, 0].由于可变文本将被传递到 DF，我认为总是使用第一列，因此使用 iloc[:, 0]。

Then my variable text would be set like this:然后我的可变文本将设置如下：

my_variable = "WHAT A WONDERFUL WORLD!"
df_Text = pd.DataFrame({my_variable})

But when applying this, it won't work, the output is 'None':但是当应用这个时，它不起作用，output 是“无”：

output = my_function(df_Text)
print(output)

What am I doing wrong?我究竟做错了什么？ Thanks a lot.非常感谢。

Answer 1

Your function doesn't actually alter the dataframe in any way, and it doesn't return anything.你的 function 实际上并没有以任何方式改变 dataframe ，它也没有返回任何东西。

Try this.尝试这个。

mport pandas as pd

def my_function(x):
    # Applies a few cleaning steps to the exceptions df:
    # Sets text to lower case:
    x.iloc[:, 0] = x.iloc[:, 0].str.lower()
    # Removes breaks:
    x.iloc[:, 0] = x.iloc[:, 0].replace(r'\n', ' ', regex=True)
    # Sets text to lower case:
    x.iloc[:, 0]  = x.iloc[:, 0].str.lower()
    # Removes a more extensive set of 'special' characters:
    remove_these = ["!",'"',"%","&","'","(",")","#","*","?",
                    "+",",","-",".","/",":",";","<","=",">",
                    "@","[","\\","]","^","_","`","{","|","}",
                    "~","–","’", "*"]
    for char in remove_these:
        x.iloc[:, 0] = x.iloc[:, 0].str.replace(char, ' ')
    # Removes numbers:
    x.iloc[:, 0] = x.iloc[:, 0].replace(r'\d+', ' ', regex=True)
    # Removes single characters:
    x.iloc[:, 0] = x.iloc[:, 0].replace(r'\b[a-zA-Z]\b', ' ', regex=True)
    # Removes extra spaces (trim) from both ends:
    x.iloc[:, 0] = x.iloc[:, 0].str.strip()
    # Removes double spacing:
    x.iloc[:, 0] = x.iloc[:, 0].replace(r' +', ' ', regex=True)
    # Removes spaces --:
    x.iloc[:, 0] = x.iloc[:, 0].replace(r'--', '', regex=True)
    return x

my_variable = "WHAT A WONDERFUL WORLD!"
df_Text = pd.DataFrame({my_variable})

output = my_function(df_Text)
print(output)

                      0
0  what wonderful world

Function 使用可变文本清理数据框列

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-10-03 17:22:29

Function 使用可变文本清理数据框列

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-10-03 17:22:29

解决方案1
1 已采纳 2021-10-03 17:22:29