简体   繁体   English

Python:从现有 df 创建一个 boolean df,如果列值等于

[英]Python: create a boolean df from existing df, if column values equal to

I noticed an error in my code and would like to use your help with my GUI.我注意到我的代码中有一个错误,并希望在我的 GUI 中使用您的帮助。

I have a function which get a selected column name (line 3), identifies all the unique values of the column and later on create new data frames equal to the number of unique values.我有一个 function,它获取选定的列名(第 3 行),标识该列的所有唯一值,然后创建等于唯一值数量的新数据框。

I noticed an issue with the line 8,我注意到第 8 行有问题,

  1. firstly I am using contain which can add any rows to two or more dataframes, while the goal is to add each row to one dataframe.首先,我使用 contain 可以将任何行添加到两个或多个数据帧,而目标是将每一行添加到一个 dataframe。
  2. if the column is not string the function does not work due to contains() function, since I need to use.str before that如果该列不是字符串,则 function 由于 contains() function 而不起作用,因为我需要在此之前使用 .str

I couldn't a function equal to contains() but which checks the equality, and I am trying to avoid loops in this case.我不能让 function 等于 contains() 但它会检查相等性,在这种情况下我试图避免循环。 Any help will be appreciated.任何帮助将不胜感激。 thanks!谢谢!

1) def basic_splitter():
2)     global df
3)     column = combobox_column_list.get() 
4)     unique_values = df[column].unique()
5)     for i in unique_values:
6)        
7)        # first df[] will split the original data frame into smaller data frames based on i value
8)        df_output = df[df[column].str.contains(i)]
9)        
10)       output_path = csv_xlsx_file_path + '/' + i + '.xlsx'
11)       df_output.to_excel(output_path, sheet_name = i, index = False)
12)       label_after_split = Label(my_frame_1, text = "Saved in: " + csv_xlsx_file_path)
13)       label_after_split.grid(row = 4, column = 1)

Error message:错误信息:

Exception in Tkinter callback
Traceback (most recent call last):
  File "C:\Users\orkhamir\AppData\Local\Programs\Python\Python310\lib\tkinter\__init__.py", line 1921, in __call__
    return self.func(*args)
  File "C:\Users\orkhamir\AppData\Local\Temp\1/ipykernel_1976/2220190921.py", line 76, in basic_splitter
    df_output = df[df[column].str.contains(i)]

    raise AttributeError("Can only use .str accessor with string values!")
AttributeError: Can only use .str accessor with string values!

converting column to str and then run the function.将列转换为 str,然后运行 function。

UPDATE: I have changed the code to the following one.更新:我已将代码更改为以下代码。 To solve all the issues I had previously.解决我之前遇到的所有问题。

def basic_splitter():
    global df
    column = combobox_column_list.get() 
    unique_values = df[column].unique()
        
    for i in range(len(unique_values)):
        # create a new file to store the df
        output_path = 'C:/Users/orkhamir/Desktop/New folder/' + str(unique_values[i]) + '.xlsx'    
        # create a first df where the column value is equal to first unique value
        df_output = df[df[column] == unique_values[i]]
        df_output.to_excel(output_path, sheet_name = str(unique_values[i]), index = False)
        label_after_split = Label(my_frame_1, text = "Saved in: " + csv_xlsx_file_path)
        label_after_split.grid(row = 4, column = 1)

You need to make sure your column is of type string before trying to call the str accessor on it.在尝试对其调用str访问器之前,您需要确保您的列是string类型。 Just try:试一试:

df_output = df[df[column].astype('string').str.contains(i)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM