使用numpy數組中的值從DataFrame創建Pandas DataFrame以訪問數據幀索引

Question

我有一個7000行的大型數據集，有40個功能。 我想創建兩個包含原始行的新數據框。 我想使用1D numpy數組中的值選擇哪些行進入哪個數據幀，然后將數組中的值與原始數據幀的索引進行比較，如果它們匹配，我想獲取原始數據幀的整行和將其添加到新數據框中。

#reading in my cleaned customer data and creating the original dataframe.
customer_data = pd.read_excel('Clean Customer Data.xlsx', index_col = 0)
#this is the 1D array that has a single element that corresponds to the index number of customer_data
group_list = np.array([2045,323,41,...,n])
# creating the arrays with a slice from group_list with the values of the row indexes for the groups
group_1 = np.array(group_list[:1972])
group_2 = np.array(group_list[1972:])
for X in range(len(group_list):
    i = 0
    #this is where I get stuck
    if group_1[i] == **the index of the original dataframe**
        group1_df = pd.append(customer_data)
    else:
        group2_df = pd.append(customer_data)
    i = i+1

顯然，我有一些嚴肅的語法，可能還有一些與我正在做的事情有關的嚴重邏輯問題，但是我已經在這個牆上打了一個星期了，而且我的大腦已經糊塗了。

我期望發生的是原始數據幀索引2045中的行將進入group1_df。

最后，我希望創建兩個與原始數據集具有相同功能的數據框（group1_df和group2_df），第一個具有1,972個記錄，第二個具有5,028個。

數據集看起來像這樣：

Answer 1

考慮使用DataFrame.reindex將每個組值與customer_data的索引對齊。

customer_data = pd.read_excel('Clean Customer Data.xlsx', index_col = 0)

group_list = np.array([2045,323,41,...,n])

group1_df = customer_data.reindex(group_list[:1972], axis = 'index')
group2_df = customer_data.reindex(group_list[1972:], axis = 'index')

Answer 2

如果您的numpy數組是a，並且您的數據幀是df，

group1_df = df.loc[df.index.isin(a[:1972]), :]
group2_df = df.loc[df.index.isin(a[1972:]), :]

使用numpy數組中的值從DataFrame創建Pandas DataFrame以訪問數據幀索引

問題描述

2 個解決方案

解決方案1
1 2019-07-13 19:56:26

解決方案2
0 已采納 2019-07-13 19:58:38

使用numpy數組中的值從DataFrame創建Pandas DataFrame以訪問數據幀索引

問題描述

2 個解決方案

解決方案1 1 2019-07-13 19:56:26

解決方案2 0 已采納 2019-07-13 19:58:38

解決方案1
1 2019-07-13 19:56:26

解決方案2
0 已采納 2019-07-13 19:58:38