繁体   English   中英

在Python中创建一个数据框,其中包含三个列表中所有值的组合

[英]Create a data frame in Python containing a combination of all values from three lists

因此,我有两个列表: gender = ['Male', 'Female']subject = ['Math3_Exam_Mark', 'Math6_Exam_Mark', 'Math9_Exam_Mark', 'ELA3_Exam_Mark', 'ELA6_Exam_Mark', 'ELA9_Exam_Mark']和一个ndarray birthMonthYear包含从CSV文件中提取的日期列表。

我想用三列创建一个新的数据框:性别,主题,birthMonthYear。 性别,主题和birthMonthYear的每种组合都应有一行。

有没有简单的方法可以做到这一点,也许是用熊猫呢? 我想我可以创建foreach每个列表的嵌套foreach循环来创建数据框,但是如果有更简单的方法,我想尝试一下。

谢谢您的帮助!

设定

gender = ['Male', 'Female']
subject = ['Math3_Exam_Mark', 'Math6_Exam_Mark', 'Math9_Exam_Mark',
           'ELA3_Exam_Mark', 'ELA6_Exam_Mark', 'ELA9_Exam_Mark']
birthMonthYear = pd.date_range('2010-01-31', periods=2, freq='M')

选项1
itertools.product

from itertools import product

pd.DataFrame(
    list(product(gender, subject, birthMonthYear)),
    columns=['Gender', 'Subject', 'BirthMonthYear']
)

    Gender          Subject BirthMonthYear
0     Male  Math3_Exam_Mark     2010-01-31
1     Male  Math3_Exam_Mark     2010-02-28
2     Male  Math6_Exam_Mark     2010-01-31
3     Male  Math6_Exam_Mark     2010-02-28
4     Male  Math9_Exam_Mark     2010-01-31
5     Male  Math9_Exam_Mark     2010-02-28
6     Male   ELA3_Exam_Mark     2010-01-31
7     Male   ELA3_Exam_Mark     2010-02-28
8     Male   ELA6_Exam_Mark     2010-01-31
9     Male   ELA6_Exam_Mark     2010-02-28
10    Male   ELA9_Exam_Mark     2010-01-31
11    Male   ELA9_Exam_Mark     2010-02-28
12  Female  Math3_Exam_Mark     2010-01-31
13  Female  Math3_Exam_Mark     2010-02-28
14  Female  Math6_Exam_Mark     2010-01-31
15  Female  Math6_Exam_Mark     2010-02-28
16  Female  Math9_Exam_Mark     2010-01-31
17  Female  Math9_Exam_Mark     2010-02-28
18  Female   ELA3_Exam_Mark     2010-01-31
19  Female   ELA3_Exam_Mark     2010-02-28
20  Female   ELA6_Exam_Mark     2010-01-31
21  Female   ELA6_Exam_Mark     2010-02-28
22  Female   ELA9_Exam_Mark     2010-01-31
23  Female   ELA9_Exam_Mark     2010-02-28

选项2
pd.MultiIndex.from_product

idx = pd.MultiIndex.from_product(
    [gender, subject, birthMonthYear],
    names=['Gender', 'Subject', 'BirthMonthYear']
)

pd.DataFrame(index=idx).reset_index()

    Gender          Subject BirthMonthYear
0     Male  Math3_Exam_Mark     2010-01-31
1     Male  Math3_Exam_Mark     2010-02-28
2     Male  Math6_Exam_Mark     2010-01-31
3     Male  Math6_Exam_Mark     2010-02-28
4     Male  Math9_Exam_Mark     2010-01-31
5     Male  Math9_Exam_Mark     2010-02-28
6     Male   ELA3_Exam_Mark     2010-01-31
7     Male   ELA3_Exam_Mark     2010-02-28
8     Male   ELA6_Exam_Mark     2010-01-31
9     Male   ELA6_Exam_Mark     2010-02-28
10    Male   ELA9_Exam_Mark     2010-01-31
11    Male   ELA9_Exam_Mark     2010-02-28
12  Female  Math3_Exam_Mark     2010-01-31
13  Female  Math3_Exam_Mark     2010-02-28
14  Female  Math6_Exam_Mark     2010-01-31
15  Female  Math6_Exam_Mark     2010-02-28
16  Female  Math9_Exam_Mark     2010-01-31
17  Female  Math9_Exam_Mark     2010-02-28
18  Female   ELA3_Exam_Mark     2010-01-31
19  Female   ELA3_Exam_Mark     2010-02-28
20  Female   ELA6_Exam_Mark     2010-01-31
21  Female   ELA6_Exam_Mark     2010-02-28
22  Female   ELA9_Exam_Mark     2010-01-31
23  Female   ELA9_Exam_Mark     2010-02-28

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM