![](/img/trans.png)
[英]Iterating over rows in a pandas dataframe with a condition to create a new column
[英]Iterating DataFrame Rows to Create New Column Using Numpy
我正在嘗試重塑數據框,使它成為更有用的圖形結構,現在我正在做的事情是使用iterrows或itertuples重塑df,下面是最有效的方法來遍歷帶有熊貓的數據框嗎?
以下是過於簡化的數據集,但實際數據集將具有成千上萬的更多行。
group subtopic code
fruit grapes 110A
fruit apple 110B
meat pork 220A
meat chicken 220B
meat duck 220C
vegetable lettuce 300A
vegetable tomato 310A
vegetable asparagus 320A
基本上,我想基於列(“代碼”)是否在列“組”中共享相同的值來創建新列(“ code2”)。
我嘗試運行以下代碼:
df = pd.read_excel(file1, sheetname = 'Sheet3')
def reshape_iterrows(df):
reshape = []
for i, j, in df.iterrows():
for _, k in df.iterrows():
if (j['code'] == k['code']):
pass
elif j['group'] == 'nan':
reshape.append({'code1':j['code'],
'code2': j['code'],
'group': 'None'})
elif (j['group'] == k['group']):
reshape.append({'code1': j['code'],
'code2': k['code'],
'group': j['group']})
else:
pass
return reshape
reshape_iterrows(df)
或使用itertuples:
def reshape_iterrows(df):
reshape = []
for row1 df.itertuples():
for row2 in df.itertuples():
if (row1[3] == row2[3]):
pass
elif row1[1] == 'nan':
reshape.append({'code1':row1[3],
'code2': row1[3],
'group': 'None'})
elif (row1[1] == row2[1]):
reshape.append({'code1': row1[3],
'code2': row2[3],
'group': row1[1]})
else:
pass
return reshape
我將重整形傳遞給pd.DataFrame(),並且期望的輸出在下面,然后使用code1和code2列作為nx.from_pandas_edgelist中的源和目標參數來生成圖形。
code1 code2 group
0 110A 110B fruit
1 110B 110A fruit
2 220A 220B meat
3 220A 220C meat
4 220B 220A meat
5 220B 220C meat
6 220C 220A meat
7 220C 220B meat
8 300A 300B vegetable
9 300A 300C vegetable
10 300B 300A vegetable
11 300B 300C vegetable
12 300C 300A vegetable
13 300C 300B vegetable
像其他人一樣,我有興趣尋找一種更有效的方法來迭代使用Numpy的布爾運算? 尋找有關如何使用向量化/數組操作獲得相同結果的指南。
謝謝!
你可以試試:
from itertools import permutations
df.groupby('group')['code']\
.apply(lambda x: pd.DataFrame(list(permutations(x.tolist(),2))))\
.add_prefix('code').reset_index().drop('level_1',axis=1)
輸出:
group code0 code1
0 fruit 110A 110B
1 fruit 110B 110A
2 meat 220A 220B
3 meat 220A 220C
4 meat 220B 220A
5 meat 220B 220C
6 meat 220C 220A
7 meat 220C 220B
8 vegetable 300A 310A
9 vegetable 300A 320A
10 vegetable 310A 300A
11 vegetable 310A 320A
12 vegetable 320A 300A
13 vegetable 320A 310A
它可能不是最有效的,但是這是我嘗試過的。 我把太多的努力進入它只是讓我的答案去浪費:)
我的回答是,所有步驟都明確。 而且,如果您需要在兩者之間做一些事情(或者意識到您只需要名稱而不是代碼,則可以僅注釋一行)。
import pandas as pd
from itertools import permutations
def get_data():
return {
'group' : [
'fruit', 'fruit',
'meat', 'meat', 'meat',
'vegetable', 'vegetable', 'vegetable'
],
'subtopic' : [
'grapes', 'apple',
'pork', 'chicken', 'duck',
'lettuce', 'tomato', 'asparagus'
],
'code' : [
'110A', '110B',
'220A', '220B', '220C',
'300A', '310A', '320A'
]
}
# Used to retrieve code for specific item
def make_code_map(df):
return dict(df[['subtopic', 'code']].to_dict('split')['data'])
# Used to retrieve group for specific item.
def make_group_map(df):
return dict(df[['subtopic', 'group']].to_dict('split')['data'])
if __name__ == '__main__':
df = pd.DataFrame(get_data())
mapping = make_code_map(df)
group_map = make_group_map(df)
graph_edges = []
for name, group in df.groupby('group'):
graph_edges.extend( permutations(group['subtopic'].tolist(), 2) )
ndf = pd.DataFrame(graph_edges, columns=['code1', 'code2'])
# Applying the group map to get all the correct groups for each
# item.
ndf['group'] = ndf['code1'].apply(lambda x:group_map[x])
# Replace each item with its corresponding code.
ndf = ndf.replace(mapping)
print(ndf)
# code1 code2 group
# 0 110A 110B fruit
# 1 110B 110A fruit
# 2 220A 220B meat
# 3 220A 220C meat
# 4 220B 220A meat
# 5 220B 220C meat
# 6 220C 220A meat
# 7 220C 220B meat
# 8 300A 310A vegetable
# 9 300A 320A vegetable
# 10 310A 300A vegetable
# 11 310A 320A vegetable
# 12 320A 300A vegetable
# 13 320A 310A vegetable
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.