在 Pandas 中自動創建子數據幀（帶類？）

Question

我有一個數據框，我想創建一些子數據框。 現在我“手動”創建了 3 個子數據集，但我想自動化這個過程，因為我需要重用代碼，而且將來子數據集可能會超過 3 個。

假設這是我的數據集：

import pandas as pd
 

data = {'line':['a', 'b', 'c', 'a', 'a', 'b', 'b', 'b', 'c', 'r', 'j', 'j', 'r'],
        'time':['10', '3', '5', '50', '10', '20', '7', '33', '42', '15', '25', '9', '81']}
 
# Create DataFrame
df = pd.DataFrame(data)
 
# Print the output.
print(df)

結果是：

   line time
0     a   10
1     b    3
2     c    5
3     a   50
4     a   10
5     b   20
6     b    7
7     b   33
8     c   42
9     r   15
10    j   25
11    j    9
12    r   81

我需要創建 3 個子數據集，始終排除“行”列中的值“r”和“j”。 這就是我現在正在做的事情：

a = df[~df['line'].str.startswith('r') & ~df['line'].str.startswith('j') & df['line'].str.startswith('a') ]

print(a)

  line time
0    a   10
3    a   50
4    a   10

b = df[~df['line'].str.startswith('r') & ~df['line'].str.startswith('j') & df['line'].str.startswith('b') ]

print(b)


  line time
1    b    3
5    b   20
6    b    7
7    b   33

c = df[~df['line'].str.startswith('r') & ~df['line'].str.startswith('j') & df['line'].str.startswith('c') ]

print(c)

  line time
2    c    5
8    c   42

如前所述，我想自動化這個過程。 我的想法是創建一個 class； 類似的東西[編輯代碼]：

class Line:
    line_r = df['line'].str.startswith('r')
    line_j = df['line'].str.startswith('j')
    
    def __init__(self, line): 
        self.line= df['line'].str.startswith('')
        
    def get_line(self):
        if df['line'].str.startswith('a'):
            return df[~line_r & ~line_j & (self.line)]
        elif df['line'].str.startswith('b'):
            return df[~line_r & ~line_j & (self.line)]
        elif df['line'].str.startswith('c'):
            return df[~line_r & ~line_j & (self.line)]
        else:
            pass

但是當我嘗試調用它時，我得到一個錯誤：

line_a = Line('a')

line_a.get_line()

錯誤是：

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我認為問題在於使用 class 來實現 output ......此外，該過程不是自動化的：如果將來我需要 50 個子數據幀，我必須編寫 49 個“elif”，並且它不是那么好......

事實上，如果我使用“for循環”，我會得到同樣的錯誤：

for s in df[~df['line'].str.startswith('r') & ~df['line'].str.startswith('j') & df['line'].str.startswith('s')]:
    if s == a:
        print('Hello')

錯誤：

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

你怎么看？ 有什么建議么？

Answer 1

您可以使用以下代碼執行此操作。

請記住，在強制覆蓋全局變量a 、 b和c時，需要非常小心地使用此方法。

for let in ["a","b","c"]:
    globals()["{}".format(let)] = (df[~df['line'].str.startswith('r') & 
                                      ~df['line'].str.startswith('j') & 
                                       df['line'].str.startswith('{}'.format(let))])

在 Pandas 中自動創建子數據幀（帶類？）

問題描述

1 個解決方案

解決方案1
0 2022-01-21 15:23:41

在 Pandas 中自動創建子數據幀（帶類？）

問題描述

1 個解決方案

解決方案1 0 2022-01-21 15:23:41

解決方案1
0 2022-01-21 15:23:41