簡體   English   中英

使用Numpy迭代DataFrame行以創建新列

[英]Iterating DataFrame Rows to Create New Column Using Numpy

我正在嘗試重塑數據框,使它成為更有用的圖形結構,現在我正在做的事情是使用iterrows或itertuples重塑df,下面是最有效的方法來遍歷帶有熊貓的數據嗎?

以下是過於簡化的數據集,但實際數據集將具有成千上萬的更多行。

group    subtopic    code
fruit    grapes      110A
fruit    apple       110B
meat     pork        220A
meat     chicken     220B
meat     duck        220C
vegetable lettuce    300A
vegetable tomato     310A
vegetable asparagus  320A

基本上,我想基於列(“代碼”)是否在列“組”中共享相同的值來創建新列(“ code2”)。

我嘗試運行以下代碼:

df = pd.read_excel(file1, sheetname = 'Sheet3')

def reshape_iterrows(df):
    reshape = []

    for i, j, in df.iterrows():
        for _, k in df.iterrows():
            if (j['code'] == k['code']):
                pass
            elif j['group'] == 'nan':
                reshape.append({'code1':j['code'],
                       'code2': j['code'],
                       'group': 'None'})
            elif (j['group'] == k['group']):
                reshape.append({'code1': j['code'],
                       'code2': k['code'],
                       'group': j['group']})
            else:
                pass
        return reshape

reshape_iterrows(df)

或使用itertuples:

def reshape_iterrows(df):
    reshape = []

    for row1 df.itertuples():
        for row2 in df.itertuples():
            if (row1[3] == row2[3]):
                pass
            elif row1[1] == 'nan':
                reshape.append({'code1':row1[3],
                       'code2': row1[3],
                       'group': 'None'})
            elif (row1[1] == row2[1]):
                reshape.append({'code1': row1[3],
                       'code2': row2[3],
                       'group': row1[1]})
            else:
                pass
        return reshape

我將重整形傳遞給pd.DataFrame(),並且期望的輸出在下面,然后使用code1和code2列作為nx.from_pandas_edgelist中的源和目標參數來生成圖形。

    code1   code2   group
0   110A    110B    fruit
1   110B    110A    fruit
2   220A    220B    meat
3   220A    220C    meat
4   220B    220A    meat
5   220B    220C    meat
6   220C    220A    meat
7   220C    220B    meat
8   300A    300B    vegetable
9   300A    300C    vegetable
10  300B    300A    vegetable
11  300B    300C    vegetable
12  300C    300A    vegetable
13  300C    300B    vegetable

像其他人一樣,我有興趣尋找一種更有效的方法來迭代使用Numpy的布爾運算? 尋找有關如何使用向量化/數組操作獲得相同結果的指南。

謝謝!

你可以試試:

from itertools import permutations
df.groupby('group')['code']\
  .apply(lambda x: pd.DataFrame(list(permutations(x.tolist(),2))))\
  .add_prefix('code').reset_index().drop('level_1',axis=1)

輸出:

        group code0 code1
0       fruit  110A  110B
1       fruit  110B  110A
2        meat  220A  220B
3        meat  220A  220C
4        meat  220B  220A
5        meat  220B  220C
6        meat  220C  220A
7        meat  220C  220B
8   vegetable  300A  310A
9   vegetable  300A  320A
10  vegetable  310A  300A
11  vegetable  310A  320A
12  vegetable  320A  300A
13  vegetable  320A  310A

它可能不是最有效的,但是這是我嘗試過的。 我把太多的努力進入它只是讓我的答案去浪費:)

我的回答是,所有步驟都明確。 而且,如果您需要在兩者之間做一些事情(或者意識到您只需要名稱而不是代碼,則可以僅注釋一行)。

import pandas as pd
from itertools import permutations

def get_data():
    return {
        'group' : [
            'fruit', 'fruit',
            'meat', 'meat', 'meat',
            'vegetable', 'vegetable', 'vegetable'
        ],
        'subtopic' : [
            'grapes', 'apple',
            'pork', 'chicken', 'duck',
            'lettuce', 'tomato', 'asparagus'
        ],
        'code' : [
            '110A', '110B',
            '220A', '220B', '220C',
            '300A', '310A', '320A'
        ]
    }

# Used to retrieve code for specific item
def make_code_map(df):
    return dict(df[['subtopic', 'code']].to_dict('split')['data'])

# Used to retrieve group for specific item.
def make_group_map(df):
    return dict(df[['subtopic', 'group']].to_dict('split')['data'])

if __name__ == '__main__':
    df = pd.DataFrame(get_data())
    mapping = make_code_map(df)
    group_map = make_group_map(df)

    graph_edges = []
    for name, group in df.groupby('group'):
        graph_edges.extend( permutations(group['subtopic'].tolist(), 2) )

    ndf = pd.DataFrame(graph_edges, columns=['code1', 'code2'])

    # Applying the group map to get all the correct groups for each
    # item.
    ndf['group'] = ndf['code1'].apply(lambda x:group_map[x])

    # Replace each item with its corresponding code.
    ndf = ndf.replace(mapping)
    print(ndf)

#      code1 code2      group
# 0   110A  110B      fruit
# 1   110B  110A      fruit
# 2   220A  220B       meat
# 3   220A  220C       meat
# 4   220B  220A       meat
# 5   220B  220C       meat
# 6   220C  220A       meat
# 7   220C  220B       meat
# 8   300A  310A  vegetable
# 9   300A  320A  vegetable
# 10  310A  300A  vegetable
# 11  310A  320A  vegetable
# 12  320A  300A  vegetable
# 13  320A  310A  vegetable

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM