使用python3 xlrd xls到JSON

Question

I have to directly convert a xls file to a JSON document using python3 and xlrd. 我必须使用python3和xlrd直接将xls文件转换为JSON文档。

Table is here . 表在这里。

It's divided in three main categories (PUBLICATION, CONTENU, CONCLUSION) whose names are on column one (first column is zero) and number of rows by category can vary. 它分为三个主要类别（PUBLICATION，CONTENU，结论），其名称在第一列（第一列为零），按类别划分的行数可以变化。 Each rows has three key values (INDICATEURS, EVALUATION, PROPOSITION) on column 3, 5 and 7. There can be empty lines, or missing values 每行在第3列，第5列和第7列上都有三个键值（INDICATEURS，EVALUATION，PROPOSITION）。可能有空行或缺少值

I have to convert that table to the following JSON data I have written directly has a reference. 我必须将该表转换为我直接编写的以下JSON数据有一个参考。 It's valid. 这是有效的。

{
"EVALUATION": {
    "PUBLICATION": [
        {
            "INDICATEUR": "Page de garde",
            "EVALUATION": "Inexistante ou non conforme",
            "PROPOSITION D'AMELIORATION": "Consulter l'example sur CANVAS"
        },
        {
            "INDICATEUR": "Page de garde",
            "EVALUATION": "Titre du TFE non conforme",
            "PROPOSITION D'AMELIORATION": "Utilisez le titre avalisé par le conseil des études"
        },
        {
            "INDICATEUR": "Orthographe et grammaire",
            "EVALUATION": "Nombreuses fautes",
            "PROPOSITION D'AMELIORATION": "Faire relire le document"
        },
        {
            "INDICATEUR": "Nombre de page",
            "EVALUATION": "Nombre de pages grandement différent à la norme",
            "PROPOSITION D'AMELIORATION": ""
        }
    ],
    "CONTENU": [
        {
            "INDICATEUR": "Développement du sujet",
            "EVALUATION": "Présentation de l'entreprise",
            "PROPOSITION D'AMELIORATION": ""
        },
        {
            "INDICATEUR": "Développement du sujet",
            "EVALUATION": "Plan de localisation inutile",
            "PROPOSITION D'AMELIORATION": "Supprimer le plan de localisation"
        },
        {
            "INDICATEUR": "Figures et capture d'écran",
            "EVALUATION": "Captures d'écran excessives",
            "PROPOSITION D'AMELIORATION": "Pour chaque figure et capture d'écran se poser la question 'Qu'est-ce que cela apporte à mon sujet ?'"
        },
        {
            "INDICATEUR": "Figures et capture d'écran",
            "EVALUATION": "Captures d'écran Inutiles",
            "PROPOSITION D'AMELIORATION": "Pour chaque figure et capture d'écran se poser la question 'Qu'est-ce que cela apporte à mon sujet ?'"
        },
        {
            "INDICATEUR": "Figures et capture d'écran",
            "EVALUATION": "Captures d'écran illisibles",
            "PROPOSITION D'AMELIORATION": "Pour chaque figure et capture d'écran se poser la question 'Qu'est-ce que cela apporte à mon sujet ?'"
        },
        {
            "INDICATEUR": "Conclusion",
            "EVALUATION": "Conclusion inexistante",
            "PROPOSITION D'AMELIORATION": ""
        },
        {
            "INDICATEUR": "Bibliographie",
            "EVALUATION": "Inexistante",
            "PROPOSITION D'AMELIORATION": ""
        },
        {
            "INDICATEUR": "Bibliographie",
            "EVALUATION": "Non normalisée",
            "PROPOSITION D'AMELIORATION": "Ecrire la bibliographie selon la norme APA"
        }
    ],
    "CONCLUSION": [
        {
            "INDICATEUR": "",
            "EVALUATION": "Grave manquement sur le plan de la présentation",
            "PROPOSITION D'AMELIORATION": "Lire le document 'Conseil de publication' disponible sur CANVAS"
        },
        {
            "INDICATEUR": "",
            "EVALUATION": "Risque de refus du document par le conseil des études",
            "PROPOSITION D'AMELIORATION": ""
        }
    ]
}

} }

My intention is to loop through lines, check rows[1] to identify the category, and sub-loop to add data as dictionary in a list by category. 我的目的是遍历行，检查行[1]以识别类别，并使用子循环将数据作为字典按类别添加到列表中。

Here is my code so far : 到目前为止，这是我的代码：

import xlrd
file = '/home/eh/Documents/Base de Programmation/Feedback/EvaluationEI.xls'
wb = xlrd.open_workbook(file)
sheet = wb.sheet_by_index(0)
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]


def readRows():
    for rownum in range(2,sheet.nrows):
        rows = sheet.row_values(rownum)
        indicateur = rows[3]
        evaluation = rows[5]
        amelioration = rows[7]
        publication = []
        contenu = []
        conclusion = []

        if rows[1] == "PUBLICATION":

            if rows[3] == '' and rows[5] == '' and rows[7] == '':
                continue
            else:
                publication.append("INDICATEUR : " + indicateur , "EVALUATION : " + evaluation , "PROPOSITION D'AMELIORATION : " + amelioration)

        if rows[1] == "CONTENU":

            if rows[3] == '' and rows[5] == '' and rows[7] == '':
                continue
            else:
                contenu.append("INDICATEUR : " + indicateur , "EVALUATION : " + evaluation  , "PROPOSITION D'AMELIORATION : " + amelioration)

        if rows[1] == "CONCLUSION":

            if rows[3] == '' and rows[5] == '' and rows[7] == '':
                continue
            else:
                conclusion.append("INDICATEUR : " + indicateur , "EVALUATION : " + evaluation , "PROPOSITION D'AMELIORATION : " + amelioration)

    print (publication)
    print (contenu)
    print (conclusion)




readRows()

I am having a hard time figuring out how to sub-loop for the right number of rows to separate data by categories. 我很难弄清楚如何为正确的行数进行子循环以按类别分隔数据。

Any help would be welcome. 欢迎任何帮助。

Thank you in advance 先感谢您

Answer 1

Is pandas not an option? 熊猫不是一种选择吗？ Would add as a comment but don't have the rep. 将添加为评论，但没有代表。

From Documentation 来自文档

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_json.html https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_json.html
 df = pandas.read_excel('path_to_file.xls') df.to_json(path_or_buf='output_path.json', orient='table') 

Answer 2

Using the json package and the OrderedDict (to preserve key order), I think this gets to what you're expecting, and I've modified slightly so we're not building a string literal, but rather a dict which contains the data that we can then convert with json.dumps . 使用json包和OrderedDict （以保持键顺序），我认为这可以达到你所期望的，并且我稍微修改了所以我们没有构建一个字符串文字，而是一个包含数据的dict然后我们可以用json.dumps转换。

As Ron noted above, your previous attempt was skipping the lines where rows[1] was not equal to one of your three key values. 正如罗恩上面提到的，你先前的尝试是跳过rows[1]不等于你的三个关键值之一的行。

This should read every line, appending to the last non-empty key: 这应该读取每一行，追加到最后一个非空键：

def readRows(file, s_index=0):
    """
    file:    path to xls file
    s_index: sheet_index for the xls file
    returns a dict of OrderedDict of list of OrderedDict which can be parsed to JSON
    """
    d = {"EVALUATION" : OrderedDict()}  # this will be the main dict for our JSON object
    wb = xlrd.open_workbook(file)  
    sheet = wb.sheet_by_index(s_index)
    # getting the data from the worksheet
    data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
    # fill the dict with data:
    for _,row in enumerate(data[3:]):
        if row[1]:  # if there's a value, then this is a new categorie element
            categorie = row[1]
            d["EVALUATION"][categorie] = []
        if categorie:  
            i,e,a = row[3::2][:3] 
            if i or e or a:  # as long as there's any data in this row, we write the child element
                val = OrderedDict([("INDICATEUR", i),("EVALUATION", e),("PROPOSITION D'AMELIORATION", a)])
                d["EVALUATION"][categorie].append(val)
    return d

This returns a dict which can be easily parsed to json. 这将返回一个可以很容易地解析为json的dict 。 Screenshot of some output: 一些输出的屏幕截图：

Write to file if needed : 如果需要，写入文件：

import io  # for python 2
d = readRows(file,0)
with io.open('c:\debug\output.json','w',encoding='utf8') as out:
    out.write(json.dumps(d,indent=2,ensure_ascii=False))

Note: in Python 3, I don't think you need io.open . 注意：在Python 3中，我认为你不需要io.open 。

使用python3 xlrd xls到JSON

问题描述

2 个解决方案

解决方案1
1 2017-06-20 20:11:32

解决方案2
1 已采纳 2017-06-20 22:55:50

使用python3 xlrd xls到JSON

问题描述

2 个解决方案

解决方案1 1 2017-06-20 20:11:32

解决方案2 1 已采纳 2017-06-20 22:55:50

解决方案1
1 2017-06-20 20:11:32

解决方案2
1 已采纳 2017-06-20 22:55:50