Pandas convert dataframe to array of tuples without None

Question

I'm analysing some data with Apriori algorithm. This requires me to convert the dataframe into an array of tuples, with each tuple corresponding to a "row" of the dataframe.

In [1]: data
Out[1]: 
     c1   c2   c3   c4   c5
r1   a    b    c    d    None
r2   a    b    c    None None

I have tried the code below, but there's still some "None" in it. I want to remove them.

In [2]: data = [tuple(x) for x in data.values]
Out[2]: 
[('a','b','c','d',None),('a','b','c',None,None)]

I expect the data like this:

[('a','b','c','d'),('a','b','c')]

Answer 1

Use nested list comprehension with filtering:

data = [tuple([y for y in x if y is not None]) for x in data.values]
print (data)
[('a', 'b', 'c', 'd'), ('a', 'b', 'c')]

Slowier alternative if large data - reshape for remove None s and aggregate by first level of MultiIndex for tuples:

data = data.stack().groupby(level=0).apply(tuple).tolist()
print (data)
[('a', 'b', 'c', 'd'), ('a', 'b', 'c')]

Answer 2

We can also use filter inside our comprehension to achieve the desired result. Just make sure that your None 's are not strings for this to work.

data = [tuple(filter(None, x)) for x in data.values]

print(data)
# [('a', 'b', 'c', 'd'), ('a', 'b', 'c')]

Answer 3

筛选出None的另一种方法是：

data_without_none = [tuple(row[row != None]) for row in data.values]

Answer 4

另一种方法是使用转置+ apply（）：

df.T.apply(lambda x: tuple(x.dropna())).tolist()

Pandas convert dataframe to array of tuples without None

Question

4 answers

solution1
4 ACCPTED 2019-03-30 11:11:46

solution2
4 2019-03-30 11:32:31

solution3
1 2019-03-30 12:06:30

solution4
1 2019-03-30 12:33:47

Pandas convert dataframe to array of tuples without None

Question

4 answers

solution1 4 ACCPTED 2019-03-30 11:11:46

solution2 4 2019-03-30 11:32:31

solution3 1 2019-03-30 12:06:30

solution4 1 2019-03-30 12:33:47

solution1
4 ACCPTED 2019-03-30 11:11:46

solution2
4 2019-03-30 11:32:31

solution3
1 2019-03-30 12:06:30

solution4
1 2019-03-30 12:33:47