简体   繁体   English

使用python3 xlrd xls到JSON

[英]xls to JSON using python3 xlrd

I have to directly convert a xls file to a JSON document using python3 and xlrd. 我必须使用python3和xlrd直接将xls文件转换为JSON文档。

Table is here . 表在这里

It's divided in three main categories (PUBLICATION, CONTENU, CONCLUSION) whose names are on column one (first column is zero) and number of rows by category can vary. 它分为三个主要类别(PUBLICATION,CONTENU,结论),其名称在第一列(第一列为零),按类别划分的行数可以变化。 Each rows has three key values (INDICATEURS, EVALUATION, PROPOSITION) on column 3, 5 and 7. There can be empty lines, or missing values 每行在第3列,第5列和第7列上都有三个键值(INDICATEURS,EVALUATION,PROPOSITION)。可能有空行或缺少值

I have to convert that table to the following JSON data I have written directly has a reference. 我必须将该表转换为我直接编写的以下JSON数据有一个参考。 It's valid. 这是有效的。

{
"EVALUATION": {
    "PUBLICATION": [
        {
            "INDICATEUR": "Page de garde",
            "EVALUATION": "Inexistante ou non conforme",
            "PROPOSITION D'AMELIORATION": "Consulter l'example sur CANVAS"
        },
        {
            "INDICATEUR": "Page de garde",
            "EVALUATION": "Titre du TFE non conforme",
            "PROPOSITION D'AMELIORATION": "Utilisez le titre avalisé par le conseil des études"
        },
        {
            "INDICATEUR": "Orthographe et grammaire",
            "EVALUATION": "Nombreuses fautes",
            "PROPOSITION D'AMELIORATION": "Faire relire le document"
        },
        {
            "INDICATEUR": "Nombre de page",
            "EVALUATION": "Nombre de pages grandement différent à la norme",
            "PROPOSITION D'AMELIORATION": ""
        }
    ],
    "CONTENU": [
        {
            "INDICATEUR": "Développement du sujet",
            "EVALUATION": "Présentation de l'entreprise",
            "PROPOSITION D'AMELIORATION": ""
        },
        {
            "INDICATEUR": "Développement du sujet",
            "EVALUATION": "Plan de localisation inutile",
            "PROPOSITION D'AMELIORATION": "Supprimer le plan de localisation"
        },
        {
            "INDICATEUR": "Figures et capture d'écran",
            "EVALUATION": "Captures d'écran excessives",
            "PROPOSITION D'AMELIORATION": "Pour chaque figure et capture d'écran se poser la question 'Qu'est-ce que cela apporte à mon sujet ?'"
        },
        {
            "INDICATEUR": "Figures et capture d'écran",
            "EVALUATION": "Captures d'écran Inutiles",
            "PROPOSITION D'AMELIORATION": "Pour chaque figure et capture d'écran se poser la question 'Qu'est-ce que cela apporte à mon sujet ?'"
        },
        {
            "INDICATEUR": "Figures et capture d'écran",
            "EVALUATION": "Captures d'écran illisibles",
            "PROPOSITION D'AMELIORATION": "Pour chaque figure et capture d'écran se poser la question 'Qu'est-ce que cela apporte à mon sujet ?'"
        },
        {
            "INDICATEUR": "Conclusion",
            "EVALUATION": "Conclusion inexistante",
            "PROPOSITION D'AMELIORATION": ""
        },
        {
            "INDICATEUR": "Bibliographie",
            "EVALUATION": "Inexistante",
            "PROPOSITION D'AMELIORATION": ""
        },
        {
            "INDICATEUR": "Bibliographie",
            "EVALUATION": "Non normalisée",
            "PROPOSITION D'AMELIORATION": "Ecrire la bibliographie selon la norme APA"
        }
    ],
    "CONCLUSION": [
        {
            "INDICATEUR": "",
            "EVALUATION": "Grave manquement sur le plan de la présentation",
            "PROPOSITION D'AMELIORATION": "Lire le document 'Conseil de publication' disponible sur CANVAS"
        },
        {
            "INDICATEUR": "",
            "EVALUATION": "Risque de refus du document par le conseil des études",
            "PROPOSITION D'AMELIORATION": ""
        }
    ]
}

} }

My intention is to loop through lines, check rows[1] to identify the category, and sub-loop to add data as dictionary in a list by category. 我的目的是遍历行,检查行[1]以识别类别,并使用子循环将数据作为字典按类别添加到列表中。

Here is my code so far : 到目前为止,这是我的代码:

import xlrd
file = '/home/eh/Documents/Base de Programmation/Feedback/EvaluationEI.xls'
wb = xlrd.open_workbook(file)
sheet = wb.sheet_by_index(0)
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]


def readRows():
    for rownum in range(2,sheet.nrows):
        rows = sheet.row_values(rownum)
        indicateur = rows[3]
        evaluation = rows[5]
        amelioration = rows[7]
        publication = []
        contenu = []
        conclusion = []

        if rows[1] == "PUBLICATION":

            if rows[3] == '' and rows[5] == '' and rows[7] == '':
                continue
            else:
                publication.append("INDICATEUR : " + indicateur , "EVALUATION : " + evaluation , "PROPOSITION D'AMELIORATION : " + amelioration)

        if rows[1] == "CONTENU":

            if rows[3] == '' and rows[5] == '' and rows[7] == '':
                continue
            else:
                contenu.append("INDICATEUR : " + indicateur , "EVALUATION : " + evaluation  , "PROPOSITION D'AMELIORATION : " + amelioration)

        if rows[1] == "CONCLUSION":

            if rows[3] == '' and rows[5] == '' and rows[7] == '':
                continue
            else:
                conclusion.append("INDICATEUR : " + indicateur , "EVALUATION : " + evaluation , "PROPOSITION D'AMELIORATION : " + amelioration)

    print (publication)
    print (contenu)
    print (conclusion)




readRows()

I am having a hard time figuring out how to sub-loop for the right number of rows to separate data by categories. 我很难弄清楚如何为正确的行数进行子循环以按类别分隔数据。

Any help would be welcome. 欢迎任何帮助。

Thank you in advance 先感谢您

Is pandas not an option? 熊猫不是一种选择吗? Would add as a comment but don't have the rep. 将添加为评论,但没有代表。

From Documentation 来自文档

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_json.html https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_json.html

 df = pandas.read_excel('path_to_file.xls') df.to_json(path_or_buf='output_path.json', orient='table') 

Using the json package and the OrderedDict (to preserve key order), I think this gets to what you're expecting, and I've modified slightly so we're not building a string literal, but rather a dict which contains the data that we can then convert with json.dumps . 使用json包和OrderedDict (以保持键顺序),我认为这可以达到你所期望的,并且我稍微修改了所以我们没有构建一个字符串文字,而是一个包含数据的dict然后我们可以用json.dumps转换。

As Ron noted above, your previous attempt was skipping the lines where rows[1] was not equal to one of your three key values. 正如罗恩上面提到的,你先前的尝试是跳过rows[1]不等于你的三个关键值之一的行。

This should read every line, appending to the last non-empty key: 这应该读取每一行,追加到最后一个非空键:

def readRows(file, s_index=0):
    """
    file:    path to xls file
    s_index: sheet_index for the xls file
    returns a dict of OrderedDict of list of OrderedDict which can be parsed to JSON
    """
    d = {"EVALUATION" : OrderedDict()}  # this will be the main dict for our JSON object
    wb = xlrd.open_workbook(file)  
    sheet = wb.sheet_by_index(s_index)
    # getting the data from the worksheet
    data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
    # fill the dict with data:
    for _,row in enumerate(data[3:]):
        if row[1]:  # if there's a value, then this is a new categorie element
            categorie = row[1]
            d["EVALUATION"][categorie] = []
        if categorie:  
            i,e,a = row[3::2][:3] 
            if i or e or a:  # as long as there's any data in this row, we write the child element
                val = OrderedDict([("INDICATEUR", i),("EVALUATION", e),("PROPOSITION D'AMELIORATION", a)])
                d["EVALUATION"][categorie].append(val)
    return d

This returns a dict which can be easily parsed to json. 这将返回一个可以很容易地解析为json的dict Screenshot of some output: 一些输出的屏幕截图:

在此输入图像描述

Write to file if needed : 如果需要,写入文件

import io  # for python 2
d = readRows(file,0)
with io.open('c:\debug\output.json','w',encoding='utf8') as out:
    out.write(json.dumps(d,indent=2,ensure_ascii=False))

Note: in Python 3, I don't think you need io.open . 注意:在Python 3中,我认为你不需要io.open

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM