简体   繁体   中英

Pandas convert dataframe to array of tuples without None

I'm analysing some data with Apriori algorithm. This requires me to convert the dataframe into an array of tuples, with each tuple corresponding to a "row" of the dataframe.

In [1]: data
Out[1]: 
     c1   c2   c3   c4   c5
r1   a    b    c    d    None
r2   a    b    c    None None

I have tried the code below, but there's still some "None" in it. I want to remove them.

In [2]: data = [tuple(x) for x in data.values]
Out[2]: 
[('a','b','c','d',None),('a','b','c',None,None)]

I expect the data like this:

[('a','b','c','d'),('a','b','c')]

Use nested list comprehension with filtering:

data = [tuple([y for y in x if y is not None]) for x in data.values]
print (data)
[('a', 'b', 'c', 'd'), ('a', 'b', 'c')]

Slowier alternative if large data - reshape for remove None s and aggregate by first level of MultiIndex for tuples:

data = data.stack().groupby(level=0).apply(tuple).tolist()
print (data)
[('a', 'b', 'c', 'd'), ('a', 'b', 'c')]

We can also use filter inside our comprehension to achieve the desired result. Just make sure that your None 's are not strings for this to work.

data = [tuple(filter(None, x)) for x in data.values]

print(data)
# [('a', 'b', 'c', 'd'), ('a', 'b', 'c')]

筛选出None的另一种方法是:

data_without_none = [tuple(row[row != None]) for row in data.values]

另一种方法是使用转置+ apply():

df.T.apply(lambda x: tuple(x.dropna())).tolist()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM