![](/img/trans.png)
[英]How to take values out of a pandas dataframe and put them into a numpy array?
[英]How to create sequences out of a dataframe and put them in an array of arrays or a list?
對於輸入:
df = pd.DataFrame(np.array([[1, "A"],[2, "A"],[3, "B"],[4, "C"],[5, "D" ],[6, "A" ],[7, "B" ],[8, "A" ],[9, "C" ],[10, "D" ],[11,"A" ],
[12, "A"],[13, "B"],[14, "B"],[15, "D" ],[16, "A" ],[17, "B" ],[18, "A" ],[19, "C" ],[20, "D" ],[21,"A" ],
[22, "A"],[23, "A"],[24, "C"],[25, "D" ],[26, "A" ],[27, "C" ],[28, "A" ],[29, "C" ],[30, "D" ] ]),
columns=['No.', 'Value'])
我得到以下輸出:
No. Value
0 1 A
1 2 A
2 3 B
3 4 C
4 5 D
5 6 A
6 7 B
7 8 A
8 9 C
9 10 D
10 11 A
11 12 A
12 13 B
13 14 B
14 15 D
15 16 A
16 17 B
17 18 A
18 19 C
19 20 D
20 21 A
21 22 A
22 23 A
23 24 C
24 25 D
25 26 A
26 27 C
27 28 A
28 29 C
29 30 D
現在我想創建數據序列。 該序列定義了一個值區域,直到值“D”出現。 例如在第一個序列中有從No.1到No.5(包含)的行,第二個序列是從No.6到No.10(包含)等等。
之后我想將值編碼為數字:A -> 1, B->2, C->3, D->4 如果在一個序列中,值 A 后跟另一個 A 或許多 A,它將總結為一個數字 1。同樣適用於其他值。
第一個序列 = A,A,B,C,D 為此我想要這樣的東西 = [1,2,3,4]
對於整個輸出,我想要這樣的東西:
result = list([[1,2,3,4],[1,2,1,3,4],[1,2,4],[1,2,1,3,4],[1,3,4],[1,3,1,3,4]])
輸出:
[[1, 2, 3, 4],
[1, 2, 1, 3, 4],
[1, 2, 4],
[1, 2, 1, 3, 4],
[1, 3, 4],
[1, 3, 1, 3, 4]]
在這里,我使用cumsum()
為同一序列中的所有元素提供“序列 ID”(每次遇到“D”時,該值都會增加 1)
然后使用groupby()
按順序分組,並將每個組輸出到一個列表中,該列表依次被過濾以便統一連續的值,如下所示:
import pandas as pd
import numpy as np
from itertools import groupby
from pprint import pprint
df = pd.DataFrame(np.array([[1, "A"],[2, "A"],[3, "B"],[4, "C"],[5, "D" ],[6, "A" ],[7, "B" ],[8, "A" ],[9, "C" ],[10, "D" ],[11,"A" ],
[12, "A"],[13, "B"],[14, "B"],[15, "D" ],[16, "A" ],[17, "B" ],[18, "A" ],[19, "C" ],[20, "D" ],[21,"A" ],
[22, "A"],[23, "A"],[24, "C"],[25, "D" ],[26, "A" ],[27, "C" ],[28, "A" ],[29, "C" ],[30, "D" ] ]),
columns=['No.', 'Value'])
df["NumVal"] = df["Value"].map({"A":1,"B":2,"C":3,"D":4})
df["SequenceID"] = (df["Value"].shift(1) == "D").cumsum()
result = [[nums[0] for nums in groupby(g["NumVal"].tolist())] for k,g in df.groupby("SequenceID")]
pprint(result)
輸出:
[[1, 2, 3, 4],
[1, 2, 1, 3, 4],
[1, 2, 4],
[1, 2, 1, 3, 4],
[1, 3, 4],
[1, 3, 1, 3, 4]]
嘗試:
from itertools import groupby
values = df['Value'].replace({'A':1, 'B':2, 'C':3, 'D':4}).values
idx_list = [idx + 1 for idx, val in enumerate(values) if val == 4]
result = [values[i: j] for i, j in zip([0] + idx_list, idx_list + ([len(values)] if idx_list[-1] != len(values) else []))]
result = [[values[0] for values in groupby(l)] for l in result]
print(result)
[[1, 2, 3, 4],
[1, 2, 1, 3, 4],
[1, 2, 4],
[1, 2, 1, 3, 4],
[1, 3, 4],
[1, 3, 1, 3, 4]]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.