I'm analysing some data with Apriori algorithm. This requires me to convert the dataframe into an array of tuples, with each tuple corresponding to a "row" of the dataframe.
In [1]: data
Out[1]:
c1 c2 c3 c4 c5
r1 a b c d None
r2 a b c None None
I have tried the code below, but there's still some "None" in it. I want to remove them.
In [2]: data = [tuple(x) for x in data.values]
Out[2]:
[('a','b','c','d',None),('a','b','c',None,None)]
I expect the data like this:
[('a','b','c','d'),('a','b','c')]
Use nested list comprehension with filtering:
data = [tuple([y for y in x if y is not None]) for x in data.values]
print (data)
[('a', 'b', 'c', 'd'), ('a', 'b', 'c')]
Slowier alternative if large data - reshape for remove None
s and aggregate by first level of MultiIndex
for tuples:
data = data.stack().groupby(level=0).apply(tuple).tolist()
print (data)
[('a', 'b', 'c', 'd'), ('a', 'b', 'c')]
We can also use filter
inside our comprehension to achieve the desired result. Just make sure that your None
's are not strings for this to work.
data = [tuple(filter(None, x)) for x in data.values]
print(data)
# [('a', 'b', 'c', 'd'), ('a', 'b', 'c')]
筛选出None
的另一种方法是:
data_without_none = [tuple(row[row != None]) for row in data.values]
另一种方法是使用转置+ apply():
df.T.apply(lambda x: tuple(x.dropna())).tolist()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.