繁体   English   中英

结果从for循环到数据帧,然后到csv

[英]Results from for loop to dataframe, then to csv

我有学校及其开课的清单。 我还列出了一些独特的课程,其中各学校仅提供一些课程,有些则没有。 我创建了一个循环,为每个学校输出缺少的班级以及该学校的名称,但是我无法将for循环的全部结果输出到csv。

我已经可以将一所学校的课程写到csv,但是我无法将包括所有学校的for循环的整个结果写到csv。

我知道我需要将for循环的结果插入数据框。 下一步将是遍历数据帧并将结果逐行发送到csv,但是我首先需要将结果从for循环中获取到数据帧中。

读入数据框

schools = {'School': ['School A', 'School A', 'School A', 'School B', 'School B', 'School B', 'School C','School C', 'School D'], 'Class': ['Math', 'Chemistry', 'English', 'Math', 'Chemistry', 'English', 'Math', 'Chemistry', 'Physics']}
dfSchool = pd.DataFrame(data=schools)
dfSchool


classes = {'Class': ['Math', 'Chemistry', 'English', 'History', 'Physics']}
dfClasses = pd.DataFrame(data=classes)
dfClasses

对于循环

grouped = dfSchool.groupby('School')

for name, group in grouped:
    print(name)
    print(dfClasses[~(dfClasses.Class.isin(group["Class"]))])

将for循环的结果放入数据框(此代码无效)

listFinal = []
for name, group in grouped:
    print(name)
    print(dfClasses[~(dfClasses.Class.isin(group["Class"]))])
    listFinal.append(name)
    listFinal.append(dfClasses[~(dfClasses.Class.isin(group["Class"]))])

dfOutput = pd.DataFrame(listFinal)
dfOutput.to_csv('SchoolClasses.csv', index=True)

实际结果:控制台包含以下输出,但是当写入csv时,我在文件中仅获得学校A。 我希望将下面的所有输出(所有学校)都写入csv文件。

School A
     Class
3  History
4  Physics
School B
     Class
3  History
4  Physics
School C
     Class
2  English
3  History
4  Physics
School D
       Class
0       Math
1  Chemistry
2    English
3    History

所需的结果:以上输出,但在单个csv文件中。 如果您可以将学校名称放在其相应班级的每一行中,而不仅仅是将学校名称作为标题,则可以加分。

当尝试将for循环的结果放入数据帧时,我得到:

listFinal

['School A',      Class
 3  History
 4  Physics, 'School B',      Class
 3  History
 4  Physics, 'School C',      Class
 2  English
 3  History
 4  Physics, 'School D',        Class
 0       Math
 1  Chemistry
 2    English
 3    History]

创建学校数据框:

schools = {
    "School": [
        "School A",
        "School A",
        "School A",
        "School B",
        "School B",
        "School B",
        "School C",
        "School C",
        "School D",
    ],
    "Class": [
        "Math",
        "Chemistry",
        "English",
        "Math",
        "Chemistry",
        "English",
        "Math",
        "Chemistry",
        "Physics",
    ],
}
dfSchool = pd.DataFrame(data=schools)
print(dfSchool)

     School      Class
0  School A       Math
1  School A  Chemistry
2  School A    English
3  School B       Math
4  School B  Chemistry
5  School B    English
6  School C       Math
7  School C  Chemistry
8  School D    Physics

创建一个数据框,以显示所有学校都有所有班级的情况。 称之为df_tot

s = ['School A'] * len(c) + ['School B']* len(c) + ['School C']* len(c) + ['School D']* len(c)
c = ['Math', 'Chemistry', 'English', 'History', 'Physics']

df_tot = pd.DataFrame([s, c*4], index=['School','Class']).T

print(df_tot)

     School      Class
0   School A       Math
1   School A  Chemistry
2   School A    English
3   School A    History
4   School A    Physics
5   School B       Math
6   School B  Chemistry
7   School B    English
8   School B    History
9   School B    Physics
10  School C       Math
11  School C  Chemistry
12  School C    English
13  School C    History
14  School C    Physics
15  School D       Math
16  School D  Chemistry
17  School D    English
18  School D    History
19  School D    Physics

进行外部合并,然后将指标选择为True,然后过滤_merge == left_only。

df_tot = df_tot[df_tot.merge(dfSchool, how='outer', indicator=True)['_merge'] == 'left_only'])

print(df_tot)

      School      Class
3   School A    History
4   School A    Physics
8   School B    History
9   School B    Physics
12  School C    English
13  School C    History
14  School C    Physics
15  School D       Math
16  School D  Chemistry
17  School D    English
18  School D    History

储存至csv ...

df_tot.to_csv('anyfile.csv')

数据框的替代答案

我想知道使用字典和json是否不仅容易?

School = [
    "School A",
    "School A",
    "School A",
    "School B",
    "School B",
    "School B",
    "School C",
    "School C",
    "School D",
]

Class = [
    "Math",
    "Chemistry",
    "English",
    "Math",
    "Chemistry",
    "English",
    "Math",
    "Chemistry",
    "Physics",
]

列出学校中现有的课程。

A = list(zip(School, Class))

for item in A:
    print(item)

('School A', 'Math')
('School A', 'Chemistry')
('School A', 'English')
('School B', 'Math')
('School B', 'Chemistry')
('School B', 'English')
('School C', 'Math')
('School C', 'Chemistry')
('School D', 'Physics')

把它放在一个dcitionary:

d1 = {}
for item in A:
    d1.setdefault(item[0], []).append(item[1])

print(d1)

{'School A': ['Math', 'Chemistry', 'English'],
 'School B': ['Math', 'Chemistry', 'English'],
 'School C': ['Math', 'Chemistry'],
 'School D': ['Physics']}

用d1以外的项目构建一个新字典:

d2 = {}
for s in set(School):  
    for c in set(Class):
        if c in d1[s]:
            continue
        else:
            d2.setdefault(s,[]).append(c)


print(d2)

{'School C': ['Physics', 'English'],
 'School A': ['Physics'],
 'School B': ['Physics'],
 'School D': ['Math', 'Chemistry', 'English']}

然后我会考虑使用json文件:

import json

with open('data.json', 'w') as fp:
    json.dump(d2, fp)

以下代码将每所学校的所有缺失班级汇总为一组。

schools = {'School': ['School A', 'School A', 'School A', 'School B', 'School B', 'School B', 'School C','School C', 'School D'], 'Class': ['Math', 'Chemistry', 'English', 'Math', 'Chemistry', 'English', 'Math', 'Chemistry', 'Physics']}
dfSchool = pd.DataFrame(schools)

classes = {'Class': ['Math', 'Chemistry', 'English', 'History', 'Physics']}

set_classes = set(classes["Class"])
df = dfSchool.groupby('School').agg(lambda c: set_classes.difference(c))
df.name = "MissingClasses"
df.to_csv("SchoolClasses.csv")

这只是对如何将已打印的内容输出到csv文件的问题的直接答案。 因此,我保留了您的算法,仅稍微更改了listFinal列表的内容:

listFinal = []
for name, group in grouped:
    print(name)
    print(dfClasses[~(dfClasses.Class.isin(group["Class"]))])
    # add a new column with the class name to the dataframe appended to the list
    listFinal.append(dfClasses[~(dfClasses.Class.isin(group["Class"]))]
                     .assign(School=name))

然后,我们可以使用简单的pd.concat轻松地将所有内容输出到csv文件:

dfOutput = pd.concat(listFinal)
dfOutput.to_csv('SchoolClasses.csv', index=True)

一种选择是使用pandas.DataFrame.groupby.apply

import pandas as pd


schools = {'School': ['School A', 'School A', 'School A', 
                      'School B', 'School B', 'School B',
                      'School C', 'School C', 'School D'],
           'Class': ['Math', 'Chemistry', 'English',
                     'Math', 'Chemistry', 'English',
                     'Math', 'Chemistry', 'Physics']
           }

classes = {'Class': ['Math', 'Chemistry', 'English', 'History', 'Physics']}

df_school = pd.DataFrame(data=schools)
df_classes = pd.DataFrame(data=classes)

missing = (df_school.groupby('School')
                    .apply(lambda group: df_classes[~(df_classes["Class"].isin(group["Class"]))])
                    .droplevel(-1)
                    )
missing.to_csv("missing_classes.csv")

结果:

>>> missing
              Class
School             
School A    History
School A    Physics
School B    History
School B    Physics
School C    English
School C    History
School C    Physics
School D       Math
School D  Chemistry
School D    English
School D    History

missing_classes.csv

学校,班级
历史学校A
A学校,物理
历史学校B
B学校,物理
C学校,英语
历史学校C
C学校,物理
数学D校
化学D校
D学校,英语
历史学校D

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM