简体   繁体   English

如何从数据帧中创建序列并将它们放入数组数组或列表中?

[英]How to create sequences out of a dataframe and put them in an array of arrays or a list?

For the input of:对于输入:

df = pd.DataFrame(np.array([[1,  "A"],[2, "A"],[3, "B"],[4, "C"],[5, "D" ],[6, "A" ],[7, "B" ],[8, "A" ],[9, "C" ],[10, "D" ],[11,"A" ],
                           [12,  "A"],[13, "B"],[14, "B"],[15, "D" ],[16, "A" ],[17, "B" ],[18, "A" ],[19, "C" ],[20, "D" ],[21,"A" ],
                           [22,  "A"],[23, "A"],[24, "C"],[25, "D" ],[26, "A" ],[27, "C" ],[28, "A" ],[29, "C" ],[30, "D" ] ]),
                            columns=['No.',  'Value'])

I get the output of:我得到以下输出:

    No. Value
0   1   A
1   2   A
2   3   B
3   4   C
4   5   D
5   6   A
6   7   B
7   8   A
8   9   C
9   10  D
10  11  A
11  12  A
12  13  B
13  14  B
14  15  D
15  16  A
16  17  B
17  18  A
18  19  C
19  20  D
20  21  A
21  22  A
22  23  A
23  24  C
24  25  D
25  26  A
26  27  C
27  28  A
28  29  C
29  30  D

Now i want to create sequences of the data.现在我想创建数据序列。 That sequence defines a region of values till value "D" appears.该序列定义了一个值区域,直到值“D”出现。 For example in the first sequence there are the rows from No.1 till No.5(included) The second sequence is from No.6 till No.10(included) and so on.例如在第一个序列中有从No.1到No.5(包含)的行,第二个序列是从No.6到No.10(包含)等等。

After that i want to code the values into numbers: A -> 1, B->2, C->3, D->4 If in a sequence the value A is followed by another A or many A's it will be summarized to one number 1. The same applies for the other values too.之后我想将值编码为数字:A -> 1, B->2, C->3, D->4 如果在一个序列中,值 A 后跟另一个 A 或许多 A,它将总结为一个数字 1。同样适用于其他值。

First sequence = A,A,B,C,D For that i want to have something like that = [1,2,3,4]第一个序列 = A,A,B,C,D 为此我想要这样的东西 = [1,2,3,4]

For the whole output i want something like that:对于整个输出,我想要这样的东西:

result = list([[1,2,3,4],[1,2,1,3,4],[1,2,4],[1,2,1,3,4],[1,3,4],[1,3,1,3,4]])

Output:输出:

[[1, 2, 3, 4],
 [1, 2, 1, 3, 4],
 [1, 2, 4],
 [1, 2, 1, 3, 4],
 [1, 3, 4],
 [1, 3, 1, 3, 4]]

Here I'm using cumsum() to give all elements in the same sequence a "Sequence ID" (the value goes up by 1 every time a "D" is encountered)在这里,我使用cumsum()为同一序列中的所有元素提供“序列 ID”(每次遇到“D”时,该值都会增加 1)

Then use groupby() to group by sequence, and output each group to a list, which is in turn getting filtered so consecutive values are unified, like this:然后使用groupby()按顺序分组,并将每个组输出到一个列表中,该列表依次被过滤以便统一连续的值,如下所示:

import pandas as pd
import numpy as np
from itertools import groupby
from pprint import pprint

df = pd.DataFrame(np.array([[1,  "A"],[2, "A"],[3, "B"],[4, "C"],[5, "D" ],[6, "A" ],[7, "B" ],[8, "A" ],[9, "C" ],[10, "D" ],[11,"A" ],
                           [12,  "A"],[13, "B"],[14, "B"],[15, "D" ],[16, "A" ],[17, "B" ],[18, "A" ],[19, "C" ],[20, "D" ],[21,"A" ],
                           [22,  "A"],[23, "A"],[24, "C"],[25, "D" ],[26, "A" ],[27, "C" ],[28, "A" ],[29, "C" ],[30, "D" ] ]),
                            columns=['No.',  'Value'])

df["NumVal"] = df["Value"].map({"A":1,"B":2,"C":3,"D":4})
df["SequenceID"] = (df["Value"].shift(1) == "D").cumsum()

result = [[nums[0] for nums in groupby(g["NumVal"].tolist())] for k,g in df.groupby("SequenceID")]

pprint(result)

Output:输出:

[[1, 2, 3, 4],
 [1, 2, 1, 3, 4],
 [1, 2, 4],
 [1, 2, 1, 3, 4],
 [1, 3, 4],
 [1, 3, 1, 3, 4]]

Try:尝试:

from itertools import groupby
values = df['Value'].replace({'A':1, 'B':2, 'C':3, 'D':4}).values
idx_list = [idx + 1 for idx, val in enumerate(values) if val == 4]
result = [values[i: j] for i, j in zip([0] + idx_list, idx_list + ([len(values)] if idx_list[-1] != len(values) else []))]
result = [[values[0] for values in groupby(l)] for l in result]
print(result)

[[1, 2, 3, 4], 
 [1, 2, 1, 3, 4], 
 [1, 2, 4], 
 [1, 2, 1, 3, 4], 
 [1, 3, 4], 
 [1, 3, 1, 3, 4]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从熊猫数据框中获取值并将其放入numpy数组中? - How to take values out of a pandas dataframe and put them into a numpy array? 如何从熊猫数据框中提取值并将其放入新列表中? - How do a take values out of a pandas dataframe and put them into a new list? 如何使用Python从非结构化列表中提取特定元素并将其放入数据框 - How can I extract specific elements out of a unstructured list and put them into a dataframe using Python 如何将数组切成序列数组? - How to slice array into sequences arrays? 如何取出这些元素并将它们组合成一个数据框 - How to take out these elements and put them together into a dataframe 如何从 arrays 数组创建 DataFrame 实例? - How to create a DataFrame instance from array of arrays? 如何在数据框中查找所有文本值并将它们放入列表中? - How to find all text values in dataframe and put them into list? 如何子集数据框并将它们放在列表中? - How can I subset dataframe and put them on a list? 如何修复几个数组中的“ IndexError:列表索引超出范围”以及如何创建未定义的多维数组 - How to fix the 'IndexError: list index out of range' in a couple of arrays and how to create a non-defined multidimensional array 创建映射数组列表的数据框 - Create dataframe mapping a list of arrays
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM