Python: preprocess data in a format to mine for Association rules and frequent itemsets (apriori/SPADE)

Question

I have a dataframe of the format consisting of 245 rows and 2 columns in which the column Unique consists of lists :

df = (pd.DataFrame({'TC': ['101', '102', '103'], 
                    'Unique': [[189,113,213,201,125,211],   
                               [206,268,446,149,104,166],
                               [163,103,113,166,800,101]]}))

i want to iterate through the dataframe and explode the lists in Unique into separate columns so that i can run some frequent itemset mining algorithm on my data. expected output

TC     0   1    2    3    4     5

101   189  113  213  201  125  211 
102   206  268  446  149  104  166
103   163  103  113  166  800  101

Also, If possible i want to create a nested list of all unique field in sequential order:

ie

unique=[[189,113,213,201,125,211 ],[206,268,446,149,104,166],[163,103,113,166,800,101]]

Answer 1

to create a nested list :

nested_list = list(df['Unique'])

print(nested_list)
# Output:
[[189, 113, 213, 201, 125, 211],
 [206, 268, 446, 149, 104, 166],
 [163, 103, 113, 166, 800, 101]]

to create your desired table simply create a new DataFrame from this nested list and add the column TC as index column

x = pd.DataFrame(nested_list)  # fills df with each nested list as a new column
x['TC'] = df['TC']             # add TC column
x = x.set_index('TC')          # set TC column as index to make it show as first column

print(x)

# Output:
       0    1    2    3    4    5
TC                               
101  189  113  213  201  125  211
102  206  268  446  149  104  166
103  163  103  113  166  800  101 2

Answer 2

import pandas as pd

df = (pd.DataFrame({'TC': ['101', '102', '103'],
                    'Unique': [[189,113,213,201,125,211],
                               [206,268,446,149,104,166],
                               [163,103,113,166,800,101]]}))


df[list(range(len(df.Unique[0])))] = pd.DataFrame(df.Unique.values.tolist(), index= df.index)
df = df.drop('Unique', axis=1)

Output:

    TC    0    1    2    3    4    5
0  101  189  113  213  201  125  211
1  102  206  268  446  149  104  166
2  103  163  103  113  166  800  101

Python: preprocess data in a format to mine for Association rules and frequent itemsets (apriori/SPADE)

Question

2 answers

solution1
0 ACCPTED 2020-01-31 10:28:29

solution2
0 2020-01-31 10:31:44

Python: preprocess data in a format to mine for Association rules and frequent itemsets (apriori/SPADE)

Question

2 answers

solution1 0 ACCPTED 2020-01-31 10:28:29

solution2 0 2020-01-31 10:31:44

solution1
0 ACCPTED 2020-01-31 10:28:29

solution2
0 2020-01-31 10:31:44