[英]Convert pandas dataframe to a list
我有一個pandas數據幀:
apple banana carrot diet coke
1 1 1 0
0 1 0 0
1 0 0 0
1 0 1 1
0 1 1 0
0 1 1 0
我想將此轉換為以下內容:
[['apple', 'banana', 'carrot'],
['banana'],
['apple'],
['apple', 'carrot', 'diet coke'],
['banana', 'carrot'],
['banana', 'carrot']]
我該怎么做? 非常感謝。
因為生命很短暫,我可能會做一些直截了當的事情
>>> fruit = [df.columns[row.astype(bool)].tolist() for row in df.values]
>>> pprint.pprint(fruit)
[['apple', 'banana', 'carrot'],
['banana'],
['apple'],
['apple', 'carrot', 'diet coke'],
['banana', 'carrot'],
['banana', 'carrot']]
這是有效的,因為我們可以使用布爾數組( row.astype(bool)
)來僅選擇行為True的df.columns
元素。
@DSM解決方案非常棒,但只有當您的值為1
或0
時它才有效。 如果您需要將其與其他值進行比較,您可以嘗試:
[df.columns[df.ix[i,:]==1].tolist() for i in range(len(df.index))]
In [156]: [df.columns[df.ix[i,:]==1].tolist() for i in range(len(df.index))]
Out[156]:
[['apple', 'banana', 'carrot'],
['banana'],
['apple'],
['apple', 'carrot', 'dietcoke'],
['banana', 'carrot'],
['banana', 'carrot']]
編輯
雖然你可以修改一下@DSM解決方案:
In [177]: [df.columns[row == 1].tolist() for row in df.values]
Out[177]:
[['apple', 'banana', 'carrot'],
['banana'],
['apple'],
['apple', 'carrot', 'dietcoke'],
['banana', 'carrot'],
['banana', 'carrot']]
一些性能測試:
In [179]: %timeit [df.columns[row == 1].tolist() for row in df.values]
The slowest run took 4.03 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 212 us per loop
In [180]: %timeit [df.columns[row.astype(bool)].tolist() for row in df.values]
10000 loops, best of 3: 186 us per loop
In [181]: %timeit [df.columns[df.ix[i,:]==1].tolist() for i in range(len(df.index))]
100 loops, best of 3: 2.4 ms per loop
In [24]: import pandas as pd
In [25]: import io
In [26]: data = """
apple banana carrot dietcoke
1 1 1 0
0 1 0 0
1 0 0 0
1 0 1 1
0 1 1 0
0 1 1 0
"""
In [27]: df = pd.read_csv(io.StringIO(data), delimiter='\s+')
In [28]: df
Out[28]:
apple banana carrot dietcoke
0 1 1 1 0
1 0 1 0 0
2 1 0 0 0
3 1 0 1 1
4 0 1 1 0
5 0 1 1 0
In [29]: [[df.columns[i] for i,field in enumerate(record) if field == 1] for j,*record in df.itertuples()]
Out[29]:
[['apple', 'banana', 'carrot'],
['banana'],
['apple'],
['apple', 'carrot', 'dietcoke'],
['banana', 'carrot'],
['banana', 'carrot']]
不使用列表推導和擴展元組解包的解決方案如下所示:
In [32]: result = []
In [33]: for record in df.itertuples():
....: row = []
....: for i,field in enumerate(record[1:]):
....: if field == 1:
....: row.append(df.columns[i])
....: result.append(row)
....:
In [34]: result
Out[34]:
[['apple', 'banana', 'carrot'],
['banana'],
['apple'],
['apple', 'carrot', 'dietcoke'],
['banana', 'carrot'],
['banana', 'carrot']]
你可以像Pedro提到的那樣進行打算和創建,或者只使用stack()
和groupby()
來列出,
df
Out[14]:
apple banana carrot diet_coke
0 1 1 1 0
1 0 1 0 0
2 1 0 0 0
3 1 0 1 1
4 0 1 1 0
5 0 1 1 0
df.stack()
Out[15]:
0 apple 1
banana 1
carrot 1
diet_coke 0
1 apple 0
banana 1
carrot 0
diet_coke 0
2 apple 1
banana 0
carrot 0
diet_coke 0
3 apple 1
banana 0
carrot 1
diet_coke 1
4 apple 0
banana 1
carrot 1
diet_coke 0
5 apple 0
banana 1
carrot 1
diet_coke 0
dtype: int64
df.stack()[df.stack().values ==1].reset_index()
Out[20]:
level_0 level_1 0
0 0 apple 1
1 0 banana 1
2 0 carrot 1
3 1 banana 1
4 2 apple 1
5 3 apple 1
6 3 carrot 1
7 3 diet_coke 1
8 4 banana 1
9 4 carrot 1
10 5 banana 1
11 5 carrot 1
newdf.groupby(['level_0'])['level_1'].apply(list)
Out[27]:
level_0
0 [apple, banana, carrot]
1 [banana]
2 [apple]
3 [apple, carrot, diet_coke]
4 [banana, carrot]
5 [banana, carrot]
Name: level_1, dtype: object
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.