简体   繁体   中英

Python: preprocess data in a format to mine for Association rules and frequent itemsets (apriori/SPADE)

I have a dataframe of the format consisting of 245 rows and 2 columns in which the column Unique consists of lists :

df = (pd.DataFrame({'TC': ['101', '102', '103'], 
                    'Unique': [[189,113,213,201,125,211],   
                               [206,268,446,149,104,166],
                               [163,103,113,166,800,101]]}))

i want to iterate through the dataframe and explode the lists in Unique into separate columns so that i can run some frequent itemset mining algorithm on my data. expected output

TC     0   1    2    3    4     5

101   189  113  213  201  125  211 
102   206  268  446  149  104  166
103   163  103  113  166  800  101

Also, If possible i want to create a nested list of all unique field in sequential order:

ie

unique=[[189,113,213,201,125,211 ],[206,268,446,149,104,166],[163,103,113,166,800,101]]

to create a nested list :

nested_list = list(df['Unique'])

print(nested_list)
# Output:
[[189, 113, 213, 201, 125, 211],
 [206, 268, 446, 149, 104, 166],
 [163, 103, 113, 166, 800, 101]]

to create your desired table simply create a new DataFrame from this nested list and add the column TC as index column

x = pd.DataFrame(nested_list)  # fills df with each nested list as a new column
x['TC'] = df['TC']             # add TC column
x = x.set_index('TC')          # set TC column as index to make it show as first column

print(x)

# Output:
       0    1    2    3    4    5
TC                               
101  189  113  213  201  125  211
102  206  268  446  149  104  166
103  163  103  113  166  800  101 2
import pandas as pd

df = (pd.DataFrame({'TC': ['101', '102', '103'],
                    'Unique': [[189,113,213,201,125,211],
                               [206,268,446,149,104,166],
                               [163,103,113,166,800,101]]}))


df[list(range(len(df.Unique[0])))] = pd.DataFrame(df.Unique.values.tolist(), index= df.index)
df = df.drop('Unique', axis=1)

Output:

    TC    0    1    2    3    4    5
0  101  189  113  213  201  125  211
1  102  206  268  446  149  104  166
2  103  163  103  113  166  800  101

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM