[英]Convert a dataframe to a list of tuples
我有一張桌子pandas DF看起來像
奴隸 | 開始地址0 | end_addr0 | 開始地址1 | end_addr1 | start_addr2 | end_addr2 | |
---|---|---|---|---|---|---|---|
0 | 0 | 10000000 | 1FFFFFF | 鈉 | 鈉 | 鈉 | 鈉 |
1 | 1 | 20000000 | 2007FFFF | 40000000 | 40005FFF | 鈉 | 鈉 |
2 | 1 | 20000000 | 2007FFFF | 2010萬 | 201FFFFF | 鈉 | 鈉 |
3 | 2 | 20200000 | 202FFFF | 20080000 | 20085FFF | 40006000 | 400FFFF |
4 | 3 | 0 | 0FFFFFF | 鈉 | 鈉 | 鈉 | 鈉 |
5 | 4 | 20300000 | 203FFFF | 鈉 | 鈉 | 鈉 | 鈉 |
6 | 5 | 2040萬 | 204FFFFF | 鈉 | 鈉 | 鈉 | 鈉 |
對於每個從屬編號,我需要將其轉換為范圍列表(元組)。 例如,
Slave1_list = ( (20000000, 2007FFFF), (40000000, 40005FFF), (20100000, 201FFFFF))
從站(行)和地址對(列)的數量可以變化。
謝謝
編輯:
運行以下代碼將樣本數據加載到 dataframe 中:
import pandas as pd
import io
f = io.StringIO('''Slave|start_addr0|end_addr0|start_addr1|end_addr1|start_addr2|end_addr2
0|10000000|1FFFFFFF|NaN|NaN|NaN|NaN
1|20000000|2007FFFF|40000000|40005FFF|NaN|NaN
1|20000000|2007FFFF|20100000|201FFFFF|NaN|NaN
2|20200000|202FFFFF|20080000|20085FFF|40006000|400FFFFF
3|0|0FFFFFFF|NaN|NaN|NaN|NaN
4|20300000|203FFFFF|NaN|NaN|NaN|NaN
5|20400000|204FFFFF|NaN|NaN|NaN|NaN
''')
df = pd.read_csv(f, sep='|', engine='python', index_col=None)
如下所示:
import pandas as pd
from collections import defaultdict
data = [{'Slave': 1, 'start_addr0': 12, 'end_addr0': 189, 'start_addr1': 9, 'end_addr1': 17},
{'Slave': 1, 'start_addr0': 3, 'end_addr0': 6, 'start_addr1': 1, 'end_addr1': 4},
{'Slave': 3, 'start_addr0': 1, 'end_addr0': 7, 'start_addr1': 2, 'end_addr1': 14}]
df = pd.DataFrame(data)
print(df)
result = defaultdict(list)
rows = df.to_dict(orient='records')
for row in rows:
slave = row.get('Slave')
for key, start_value in row.items():
if key.startswith('start_addr'):
idx = key[-1]
end_value = row.get('end_addr' + idx)
result[slave].append((start_value, end_value))
else:
continue
print('result:')
print(result)
output
Slave start_addr0 end_addr0 start_addr1 end_addr1
0 1 12 189 9 17
1 1 3 6 1 4
2 3 1 7 2 14
result:
defaultdict(<class 'list'>, {1: [(12, 189), (9, 17), (3, 6), (1, 4)], 3: [(1, 7), (2, 14)]})
你可以試試:
通過wide_to_long
一種選擇:
df = df.reset_index()
result = pd.wide_to_long(df, stubnames=['start_addr', 'end_addr'], i=['index', 'Slave'], j='add_num', sep='').dropna(
).reset_index([0, -1], drop=True).apply(tuple, 1).groupby(level=0).agg(list)
通過groupby
的一個選項:
k = df.set_index('Slave').stack().reset_index()
result = k.groupby(k.index//2).agg({'Slave': 'first', 0 : tuple}).groupby('Slave').agg({0 : set})
說明:
df.set_index('Slave').stack().reset_index()
將刪除NaN
值並堆疊 dataframe。
k.groupby(k.index//2)
將對備用行進行分組並執行所需的聚合(在此步驟中形成元組)
.groupby('Slave').agg({0: set})
-> 最后一個 groupby 是為每個從屬捕獲唯一的元組。
OUTPUT:
0
Slave
0 {(10000000, 1FFFFFFF)}
1 {(40000000.0, 40005FFF), (20100000.0, 201FFFFF), (20000000, 2007FFFF)}
2 {(20080000.0, 20085FFF), (40006000.0, 400FFFFF), (20200000, 202FFFFF)}
3 {(0, 0FFFFFFF)}
4 {(20300000, 203FFFFF)}
5 {(20400000, 204FFFFF)}
注意:我假設每個start_addr
都存在一個end_addr
。
我認為這就是你要找的:
def make_tuples(x):
return tuple([x['start_addr0'], x['end_addr0']])
# simple tuples
result = tuple(df[['start_addr0', 'end_addr0']].apply(make_tuples, axis=1).tolist())
print(result)
# unique tuples
unique_result = tuple(df[['start_addr0', 'end_addr0']].apply(make_tuples, axis=1).unique().tolist())
print(unique_result)
Output
((10000000, '1FFFFFFF'), (20000000, '2007FFFF'), (20000000, '2007FFFF'), (20200000, '202FFFFF'), (0, '0FFFFFFF'), (20300000, '203FFFFF'), (20400000, '204FFFFF'))
((10000000, '1FFFFFFF'), (20000000, '2007FFFF'), (20200000, '202FFFFF'), (0, '0FFFFFFF'), (20300000, '203FFFFF'), (20400000, '204FFFFF'))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.