简体   繁体   中英

Convert a dataframe to a list of tuples

I have a table pandas DF which looks like

Slave start_addr0 end_addr0 start_addr1 end_addr1 start_addr2 end_addr2
0 0 10000000 1FFFFFFF NaN NaN NaN NaN
1 1 20000000 2007FFFF 40000000 40005FFF NaN NaN
2 1 20000000 2007FFFF 20100000 201FFFFF NaN NaN
3 2 20200000 202FFFFF 20080000 20085FFF 40006000 400FFFFF
4 3 0 0FFFFFFF NaN NaN NaN NaN
5 4 20300000 203FFFFF NaN NaN NaN NaN
6 5 20400000 204FFFFF NaN NaN NaN NaN

For each slave number I need to convert it to a list of ranges (tuples). For example,

Slave1_list = ( (20000000, 2007FFFF), (40000000, 40005FFF), (20100000, 201FFFFF))

The number of slaves (rows) and address-pairs (columns) can vary.

Thanks

EDIT :

Run the following code to load sample data into dataframe:

import pandas as pd
import io

f = io.StringIO('''Slave|start_addr0|end_addr0|start_addr1|end_addr1|start_addr2|end_addr2
0|10000000|1FFFFFFF|NaN|NaN|NaN|NaN
1|20000000|2007FFFF|40000000|40005FFF|NaN|NaN
1|20000000|2007FFFF|20100000|201FFFFF|NaN|NaN
2|20200000|202FFFFF|20080000|20085FFF|40006000|400FFFFF
3|0|0FFFFFFF|NaN|NaN|NaN|NaN
4|20300000|203FFFFF|NaN|NaN|NaN|NaN
5|20400000|204FFFFF|NaN|NaN|NaN|NaN
''')
df = pd.read_csv(f, sep='|', engine='python', index_col=None)

Something like the below:

import pandas as pd
from collections import defaultdict

data = [{'Slave': 1, 'start_addr0': 12, 'end_addr0': 189, 'start_addr1': 9, 'end_addr1': 17},
        {'Slave': 1, 'start_addr0': 3, 'end_addr0': 6, 'start_addr1': 1, 'end_addr1': 4},
        {'Slave': 3, 'start_addr0': 1, 'end_addr0': 7, 'start_addr1': 2, 'end_addr1': 14}]

df = pd.DataFrame(data)

print(df)
result = defaultdict(list)
rows = df.to_dict(orient='records')
for row in rows:
    slave = row.get('Slave')
    for key, start_value in row.items():
        if key.startswith('start_addr'):
            idx = key[-1]
            end_value = row.get('end_addr' + idx)
            result[slave].append((start_value, end_value))
        else:
            continue

print('result:')
print(result)

output

   Slave  start_addr0  end_addr0  start_addr1  end_addr1
0      1           12        189            9         17
1      1            3          6            1          4
2      3            1          7            2         14
result:
defaultdict(<class 'list'>, {1: [(12, 189), (9, 17), (3, 6), (1, 4)], 3: [(1, 7), (2, 14)]})

You can try:

One option via wide_to_long :


df = df.reset_index()
result = pd.wide_to_long(df, stubnames=['start_addr', 'end_addr'], i=['index', 'Slave'], j='add_num', sep='').dropna(
).reset_index([0, -1], drop=True).apply(tuple, 1).groupby(level=0).agg(list)

An option via groupby :

k = df.set_index('Slave').stack().reset_index()
result = k.groupby(k.index//2).agg({'Slave': 'first', 0 : tuple}).groupby('Slave').agg({0 : set})

Explanation :

df.set_index('Slave').stack().reset_index() will remove the NaN values and stack the dataframe.

k.groupby(k.index//2) will group alternate rows and perform the required aggregations(tuples are formed in this step)

.groupby('Slave').agg({0: set}) -> Last groupby is to capture the unique tuples for each slave.

OUTPUT:

                                                                            0
Slave                                                                        
0                                                      {(10000000, 1FFFFFFF)}
1      {(40000000.0, 40005FFF), (20100000.0, 201FFFFF), (20000000, 2007FFFF)}
2      {(20080000.0, 20085FFF), (40006000.0, 400FFFFF), (20200000, 202FFFFF)}
3                                                             {(0, 0FFFFFFF)}
4                                                      {(20300000, 203FFFFF)}
5                                                      {(20400000, 204FFFFF)}

NOTE: I'm assuming for every start_addr there exists an end_addr .

I think this is what you are looking for:

def make_tuples(x):
    return tuple([x['start_addr0'], x['end_addr0']])

# simple tuples
result = tuple(df[['start_addr0', 'end_addr0']].apply(make_tuples, axis=1).tolist())
print(result)

# unique tuples
unique_result = tuple(df[['start_addr0', 'end_addr0']].apply(make_tuples, axis=1).unique().tolist())
print(unique_result)

Output

((10000000, '1FFFFFFF'), (20000000, '2007FFFF'), (20000000, '2007FFFF'), (20200000, '202FFFFF'), (0, '0FFFFFFF'), (20300000, '203FFFFF'), (20400000, '204FFFFF'))
((10000000, '1FFFFFFF'), (20000000, '2007FFFF'), (20200000, '202FFFFF'), (0, '0FFFFFFF'), (20300000, '203FFFFF'), (20400000, '204FFFFF'))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM