[英]Create a new column in a Pandas DataFrame from exisiting column names
我想解構一個 Pandas DataFrame,使用列標題作為新的數據列,並創建一個包含行索引和列的所有組合的列表。 展示比解釋更容易:
index_col = ["store1", "store2", "store3"]
cols = ["January", "February", "March"]
values = [[2,3,4],[5,6,7],[8,9,10]]
df = pd.DataFrame(values, index=index_col, columns=cols)
從這個 DataFrame 我希望得到以下列表:
[['store1', 'January', 2],
['store1', 'February', 3],
['store1', 'March', 4],
['store2', 'January', 5],
['store2', 'February', 6],
['store2', 'March', 7],
['store3', 'January', 8],
['store3', 'February', 9],
['store3', 'March', 10]]
有沒有方便的方法來做到這一點?
df.unstack().swaplevel().reset_index().values.tolist()
#OR
df.reset_index().melt(id_vars="index").values.tolist()
# [['store1', 'January', 2],
# ['store2', 'January', 5],
# ['store3', 'January', 8],
# ['store1', 'February', 3],
# ['store2', 'February', 6],
# ['store3', 'February', 9],
# ['store1', 'March', 4],
# ['store2', 'March', 7],
# ['store3', 'March', 10]]
通過以下,元素的順序將與問題中的輸出相匹配。
df.transpose().unstack().reset_index().values.tolist()
# [['store1', 'January', 2],
# ['store1', 'February', 3],
# ['store1', 'March', 4],
# ['store2', 'January', 5],
# ['store2', 'February', 6],
# ['store2', 'March', 7],
# ['store3', 'January', 8],
# ['store3', 'February', 9],
# ['store3', 'March', 10]]
真正的熊貓風格:
lst = [[*k, v] for k, v in df.unstack().swaplevel().to_dict().items()]
我更喜歡堆疊而不是卸載然后交換級別:
>>> df.stack().reset_index().to_numpy()
array([['store1', 'January', 2],
['store1', 'February', 3],
['store1', 'March', 4],
['store2', 'January', 5],
['store2', 'February', 6],
['store2', 'March', 7],
['store3', 'January', 8],
['store3', 'February', 9],
['store3', 'March', 10]], dtype=object)
>>>
或者使用melt
和ignore_index=False
:
>>> df.melt(ignore_index=False).reset_index().to_numpy()
array([['store1', 'January', 2],
['store2', 'January', 5],
['store3', 'January', 8],
['store1', 'February', 3],
['store2', 'February', 6],
['store3', 'February', 9],
['store1', 'March', 4],
['store2', 'March', 7],
['store3', 'March', 10]], dtype=object)
>>>
您希望數據采用的結構非常混亂,因此鑒於您想要的數據,這可能是最好的方法。
# Results
res = []
# Nested loop: first for length of index col, then next for cols
for i in range(len(index_col)):
for j in range(len(cols)):
# Format of data
res.append([index_col[i], cols[j], values[i][j]])
# Return results
print(res)
return res
您可以使用
data = []
for col, row in df.items():
for ind, val in row.reset_index().values:
data.append([ind, col, val])
data
您可以避免犧牲您請求輸出的順序的第二個循環,因為它是結構如何開始的一個完整分解。
temp = df.stack()
[[*ent, val] for ent, val in zip(temp.index, temp)]
[['store1', 'January', 2],
['store1', 'February', 3],
['store1', 'March', 4],
['store2', 'January', 5],
['store2', 'February', 6],
['store2', 'March', 7],
['store3', 'January', 8],
['store3', 'February', 9],
['store3', 'March', 10]]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.