[英]Xlsxwriter doesn't create file upon some inputs from a dataframe, only spits gibberish into terminal
I have a rather interesting issue, when creating files with xlsxwriter, in 95% of the cases (before and even after - so no crash) the files are created correctly, but when trying to write data of a specific BSI Baustein (German IT Security baseline requirement) the saving the file fails without an error message.我有一个相当有趣的问题,当使用 xlsxwriter 创建文件时,在 95% 的情况下(之前和之后 - 所以没有崩溃)文件被正确创建,但是当尝试写入特定 BSI Baustein 的数据时(德国 IT 安全基线要求)保存文件失败且没有错误消息。 I just get some gibberish in the terminal as seen below.
如下所示,我只是在终端中看到一些乱码。 Anyone experienced something similar already?
任何人都经历过类似的事情吗? I am happy about pointers as I don't know what else to look for.
我对指针很满意,因为我不知道还要寻找什么。
with pd.ExcelWriter(
Path("output", "empty_templates", f"{module_title}.xlsx"),
engine="xlsxwriter",
) as writer:
log.info(f"Writing {module_title} data...")
create_front_page(writer, "Deckblatt")
write_file(writer, "Anforderungen", df_requirements)
log.info("We returned, after writing data, closing and saving file...")
log.info("File saved!")
INFO : [comp_converter] : get_gs_bausteine : ['CON.2.A1 Umsetzung Standard-Datenschutzmodell (B)']
INFO : [comp_converter] : get_gs_bausteine : Writing CON.2 Datenschutz data...
INFO : [comp_converter] : write_file : Nr. of entries: 3
INFO : [comp_converter] : write_file : Data written into excel.
INFO : [comp_converter] : write_file : Done writing the file
INFO : [comp_converter] : get_gs_bausteine : We returned, after writing data, closing and saving file...
�6�§♂����V�♠l�^xk�5��U�gհJ���)D��/���↕3T���▲�[a◄"x�T!5��5MeTz�� �{��'�S�!�����Q�����$ ∟��~-lY�P1:�§q].��A�7��;;
h♥↓0ydKHda�y/[�♦��o���>��t↕i�O▲§c☻e�♠��§k�♥d�� ��������#���s☼ۮE��?�߉Qv��S⌂☼��?t�J`^)�=p�����!��d� P�J�!Q☺�♦PK♥?♂_rels/.rels���J♥1►���►��Ͷ��4�K◄z‼�☼0&�?�&‼�Q�o¶�3�⌂X܁ʂ��ā♀∟(æ��Z?фR�r?Ĭ`�
%d♥�H��:۞<�#���r�(�:↔юؑ^���N?↓�∟1��↓H;�♦�?D���m;Xڲ}�¶�D�_↓���#10O�����<V♣
����r���Ԟ♦↔
4▼PP|N��L☻PK♥?→xl/_rels/workbook.xml.rels���j�0►D����{-;-��ȹ�B�m�☺BZ[&�$�ۤ�����♫�ЃObF�̃���{��▲‼u�+��↕♦z‼l�[♣�۷�↨►��[��
F$X��w�w�5�!r]$�S<)p��UJ2♫ME���O‼Ҡ9��ʨ�N�(↨e�,�4♥�L��
��V �c��d���♀���→��
iG♫�s�N-���E��TEN♣y↔f1'♀�Y�♥9ʓy��qN♠��C����Y�Nh?8�s�RL�_↑yqq�☼P��i��☻PK♥?↑xl/worksheets/sheet1.xml���n�0►��♣�♫☻�V��↔��ݤi���rf��E�↕♣���>}IJ���I|◄�|��sfHj~��Rk☼\►�-�;p��!��<��d�h�_�E�H&→]gꏇ�ڑ��♀wBy�S"�P!�↕^%q��_*����\(◄�↕A%◄t]x�J♀K�a%�w]�.1*%Fu(.�↑�↕�z!��bRJL�z↨���↕�Z�҅�Ω����ޙ‼��RS�+,�r����f�Bm��♂{b�YOީ!G)�8♥���S"
u��3��>�G���)▼U�zw���:%�KL��M�p�&��A�&��A�&��A�&� ��m☼2l"w=Ȩ�|�A�M���k��T�����↓☺�L�§כ.�
� �3�§�u�↑����]�o��↕�̮���VJ�]�?.�~���.�D��VA�C�*�]���>ۚx'نP ��K�.��▬�c�%������=9P�→/�#�V��i���N�D�♣�=§��1y��ӡ��X�♥P���ɕ☻PK♥?☼xl/workbook.xml��_O�0¶��'�↔�U_��QU��
���►CZ�Gd��±#�Yʷ�:%↔}�↔>���!�9∟�P��2-%m�H♠����H�"ts�>��{����T▲�§���;♀��\�↔�¶��B`(☻�‼‼��$YN~O�U�^�(��p%��p:�'♂`U��%���[Ç ��Xl�♫P�ާ'�☺�☻PK♥?¶xl/sharedStrings.xml}S�n�►
�W�;�|J$�@���B���#♣��↨v�Wxw�θ4<K�^�↑��b↔��JM��;���͌;W�m♠�0��→Э�6.�Fw���w◄►+�U�↔v�����U��Aj↔u��y�!�i��U��[t↕Y�`§�↓���☺��¶�m▬��ͷ�U�E���n�~‼A���∟�������!��po��P'�^'.∟G�
�@,+eC♠♥���a)�p�k♀���▼@��b9�D�T�♥�L09Υ♠��x��)8�0M���xl��&�z.1⌂2�%���♠♥��\
}♫*a♀�4���☼♠Ov��_u�ݺ☺�p☻�0�Y�y�A9��PL����↕�c@�|&���
xl/styles.xml�Tۊ�0♀}/�▼��t;�%�R▬♠�▼P1♦�?♥☻♦♦PK♥?¶
m)�,�Չ����`k�I��r�d2�lK/y�%Y:G�d↨w�V�,��֔4�I)◄��\���������x`�3e�(�(<��^�(<�J<∟��►ƗпO↕�∟�f�����Nk�f���↕�;��☼AZ%y��↕ͤ�U�Z♥�4�d���l�
S�`��♣�m�R☼a���W�CK�I�5|��p��I/"&4�◄&*☺⌂�▬�7��⌂♦K�v���h��^�▼��▬K�lQí♣ل↓kP§qȆ�⌂R~w�?�☺���,[va˟e��H4k↑�����t��߫��kY�/�Q�$§H�D�◄�☼�6O��j|ѮX►����¶∟�͒^�ς˓~�z}�g♂��E�¶�<�↓
\���'P#�♣�☻☻|♣PK♥?‼xl/theme/theme1.xml�YM��D↑�#�▼F���‼;ͮ��6٤�v��nZ��ę�ӌ=��d�������(�♂↕7��J\ʯY(�"�/��#�x3�f�E¶�9$���~⌂�↔���◄C�DH���\�Y��>▼�8h[��♂-♂I��◄f<&mkF�ue��☼.�§
�� ���&n[�RɦmK▼����‼↕ý1↨◄V�¶�=↕�F̮�jM;�4�P�#`{k<�>A����5g�c�§+�n�L∟��D�"Î&N�#g��♦:Ĭm��◄?→���B♀K♣7�V-�X��e{A�T♣�F��>♣]A0��3:◄♀↨�N�ݸ���_����z�^��,�e��`���u�-�3穁��U�ݚW
ǁN���/���S�ד�^<�ʌ�:���>���/�@�♥�⌂������⌂���?<2��♣▲��☺��D7�◄��◄�f►@��l¶�►�↕♣♫☺i�TX☻ޜaf�uH�yw♦4‼���^I׃PL§5��Q˗+[�,��9�pa4�z*K7g→f�b���1>4�mo�@&S‼�nHJj�1�6♫HL¶J�� !♠������K}�%▼+t��♫�F�♀�P����23�.�f�♫�pfb�C♫�H�L, +��*�*∟↓5�◄ӑ7�
MJ▲̄_r�T►�0�z#"��斘�Խ��‼↓þ�fQ↓)¶���70�:r�O�!�↕��4♫u�Gr☻)��▲WF%x�B�5�☺Ǖ�C�:[YߦAhN���T¶]��⌂#→�↓�n��↓����h2���▬\��▼6�↔<�������ヒ}������♠k�sq�/�∟�ǔ�♥5c��Z�♦�G}��▬↓�b&OB�
,ĕp���5↕\}BUx►�♦�8��@▬�♥�↕.�$`U�Ύ�¶�����↓►�X��Q���φ♂6�*���F�`]a�Ko&�ɁkJs<�4�Ti��M�♠�ӓ�Ӭ�!c0#���9�yX�=D2�#R��1→�4�t[��^Ӥm4�L�:A�Ź§�s�Rm%J�j9���BG��W�,��m�a���(☺~2m@�♣q��Ua�+�����tj�♠�D$B�↔,Ü*�5⌂u↕/��{n��1�Ѝ�Ӣ�r�C-쓡%�1�U��rY��SE�A8:BC6§�↑�v��→Q ό�|!�B�"�ʕ_T��W4Eu`����I--�9<�^萭4��
�_Ӕ�9�⽻���♂ckc�↔�`♀►↑�9ڶ�P!�.����♂↑∟2Y�↨��HUB,}��J♫�}+�7� T�4@�B�S� dO§v���Sן�sFE�Y�+��wH♫ ��L�P8�&�#2�ɠ٦�→♠��x�q+&��ǃ� �,���5}�Q��f*��Q[7[\��~�&p�@�↨4n*|��o|▼�����9�wϽ���T������)�⌂w�Z��U◄��∟>5g7*�}���w�g�w����↕���L�Z��☼��↔8(M���ۤ�p����2>��t�▼P�%��♣SK♥?◄docProps/core.xml��_K�0¶��♣�C�{��Sq��@eO♫♦'‼�Br�♣�?$Ѯ�ۺ
�,G hn�ԛ
�r�T2t▬�Z����y9→۶���`��g�\��☼��R����Rp�↔�`\]��"▲�a>,��↕�S↨�+o�E�} �↑���▲����e9Gu�↨�4'iA�dJ��h>��G^���0���#`���‼�⌂P�A��'☺Q☻PK♥?►docProps/app.xml�R�N�0►���?D�S�§B�r�P♂��R♂{�:�ƪc[�i����jHaO�ӛ7O�/3#�☼��Z�h�+�t��♀���qۂ�o���X��\��wP�# ��W��*��♀`�,∟▬�&
@(�@�p�↔cs#�� �K!▼�$|↓qc�☻�U+§�▼����}♠6�����ˇ▼�cH~R<�`�V��R�→↔=���ǃ♠+��)��→�>→:�\�q)�ZYX$cY)� �↨!�AuC[)‼Q���-h�1C��6c�▼���)X��Q��Iv*zl♥R��}�a
J�{�♣�#���oO,|‼�Ks�♥z1n��a㗊�<�KR�k§�L�↑�>►�9ŋ��/j���g��Fw♥▼�C���$O_��3'��I˿P�=��⌂☺↑♥PK☺☻?�J�!Q☺�♦‼��[Content_Types].xmlPK☺☻?P|N��L☻♂���☺_rels/.relsPK☺☻?��i��☻→���☻x���rels/workbook.xml.relsPK☺☻?���ɕ☻↑���♥xl/worksheets/sheet1.xmlPK☺☻?�ާ'�☺�☻☼���♠xl/workbook.xmlPK☺☻?1♦�?♥☻♦♦¶��xl/sharedStrings.xmlPK☺☻?#�♣�☻☻|♣
xl/theme/theme1.xmlPK☺☻?�A��'☺Q☻◄��♦‼docProps/core.xmlPK☺☻?�=��⌂☺↑♥►��j¶docProps/app.xmlPK♣♠
�☻'▬
BSI Kompendium XML edition - save the content as "Kompendium2022.xml" into the path were the comp_converter.py is >I know it's an awful format, but that is unfortunately given... BSI Kompendium XML 版- 将内容作为“Kompendium2022.xml”保存到路径中,如果 comp_converter.py 是>我知道这是一种糟糕的格式,但不幸的是...
I use VSCode with Python 3.9.4 - additional packages above the imports: colorlog, xlsxwriter (what I remember)我将 VSCode 与 Python 3.9.4 一起使用 - 导入上方的附加包:colorlog、xlsxwriter(我记得)
YAML config YAML 配置
<<comp_converter.yaml>> SKIP_NR_COMPENDIUM_CHAPTERS: 7 # We have that number of chapters without Bausteine INPUT_BSI_COMPENDIUM: "Kompendium2022.xml" INPUT_PREV_MAPPING_COLUMNS: - "Kategorie" - "CIA" - "Anf. Nr." - "Titel" - "Verantwortung" - "Umsetzungsbeschreibung" - "Referenziertes Dokument" - "Status" - "Risiko" - "Risikobeschreibung" - "Gefährdungszuordnung" ABBREV_REPLACEMENTS: "usw.": "usw|" "zB": "zB|" "z. B.": "z| B|" "bzw.": "bzw|" "ggf.": "ggf|" "idR.": "idR|" "idR.": "idR|" "engl.": "engl|" "inkl.": "inkl|" "Absch.": "Absch|" "o.Ä.": "o.Ä|" "dh": "dh|" "etc.": "etc|" "bspw.": "bspw|" "va": "va|" "vA": "vA|"
Logger config记录器配置
<<logging.conf>>
[loggers]
keys=root
[handlers]
keys=consoleHandler,fileHandler
[formatters]
keys=color_console,file
[logger_root]
level=DEBUG
handlers=consoleHandler,fileHandler
[handler_consoleHandler]
class=StreamHandler
level=INFO
formatter=color_console
args=(sys.stdout,)
[handler_fileHandler]
class=FileHandler
level=DEBUG
formatter=file
kwargs={"filename": "bsi_compendium_debug.log" ,"mode": "w"}
[formatter_file]
format=%(asctime)s %(levelname)-7s : [%(module)-12s] : %(funcName)-12s : %(message)s
datefmt=%Y-%m-%d %H:%M:%S
[formatter_color_console]
class=colorlog.ColoredFormatter
format=%(log_color)s %(levelname)-7s : [%(module)-12s] : %(funcName)-12s : %(message)s
import logging
from logging import config
import yaml
import pandas as pd
from pathlib import Path
import re
from bs4 import BeautifulSoup
config.fileConfig(fname="logging.conf")
log = logging.getLogger(__name__)
with open("comp_converter.yaml", "r") as f:
comp_conv_config = yaml.load(f, Loader=yaml.FullLoader)
def main():
with open(comp_conv_config["INPUT_BSI_COMPENDIUM"], "r", encoding="utf-8") as f:
compendium = f.read()
compendium_tree = BeautifulSoup(compendium, "xml")
get_gs_bausteine(compendium_tree)
def get_gs_bausteine(compendium_soup: BeautifulSoup) -> None:
# Each BSI Baustein is grouped into a chapter
bausteine = compendium_soup.find_all("chapter")
# We get rid of unneeded chapters at the beginning, so we start only with the first Baustein
for _ in range(comp_conv_config["SKIP_NR_COMPENDIUM_CHAPTERS"]):
bausteine.pop(0)
log.info(
f"We found {len(bausteine)} Bausteine to work with: {[baustein.title.text for baustein in bausteine]} "
)
# We loop through each Baustein to get the necessary data
for baustein in bausteine:
module_titles = []
modules = []
log.info(f'Starting to process "{baustein.title.text}" Baustein')
# first_module_title = baustein.find_next("section").title.text
first_module = baustein.find_next("section")
first_module_title = first_module.title.text
module_titles.append(first_module_title)
modules.append(first_module)
# Append any potential additional module titles to the list of module titles
# and modules to the list of modules
[
(module_titles.append(a.title.text), modules.append(a))
for a in baustein.section.find_next_siblings("section")
]
log.info(
f"We have {len(module_titles)} modules in the Baustein: {module_titles}"
)
for (module, module_title) in zip(modules, module_titles):
requirements = []
module_title_prefix = module_title.split(" ", 1)[0]
req_title_matcher = rf"^{module_title_prefix}\.(\d+\.)*A\d+\s{{1}}"
log.debug(req_title_matcher)
requirements = module.find_all("title", text=re.compile(req_title_matcher))
log.info(f"We have {len(requirements)} requirements in the module.")
log.info([req.text for req in requirements])
mod_requirements = get_requirements(requirements)
df_requirements = pd.DataFrame.from_dict(
mod_requirements,
orient="index",
columns=["Titel", "Verantwortung", "Kategorie"],
)
# Here we get the Anf. Nr. data from the index into a column
df_requirements["Anf. Nr."] = df_requirements.index.values
all_headers = comp_conv_config["INPUT_PREV_MAPPING_COLUMNS"]
missing_headers = set(all_headers) - set(df_requirements.columns)
for header in missing_headers:
df_requirements[header] = ""
# Final data format
df_requirements = df_requirements[all_headers]
# We drop the rows, where the subcriteria only contains text, that the main criteria is not there anymore,
# overall we will compare against the ID of the main criteria.
df_requirements.drop(
df_requirements[
df_requirements["Titel"] == "Diese Anforderung ist entfallen."
].index,
inplace=True,
)
# Create comparison with prev. edition file
# TODO
# Create empty template
with pd.ExcelWriter(
Path("output", "empty_templates", f"{module_title}.xlsx"),
engine="xlsxwriter",
) as writer:
log.info(f"Writing {module_title} data...")
create_front_page(writer, "Deckblatt")
write_file(writer, "Anforderungen", df_requirements)
log.info("We returned, after writing data, closing and saving file...")
log.info("File saved!")
def get_requirements(requirements: BeautifulSoup) -> dict[str, list[str]]:
req_rows = {}
for requirement in requirements:
req_info = []
# Here we want to prepare the for the final dataframe format,
# where each entry will be a row in the dataframe
# Index will be the Anf.Nr. e.g. ISMS.1.A01 or ISMS.1.A01-1
req_index, req_title = requirement.text.split(" ", 1)
log.debug(f"Preparing data for {requirement.text}")
req_info.append(req_title)
# Here we make the single digit IDs two digit IDs
if "." in req_index[-3]:
req_index = f"{req_index[:-1]}0{req_index[-1:]}"
# print(req_index)
req_owner = get_req_owner(requirement)
req_info.append(req_owner)
req_category = get_req_category(requirement)
req_info.append(req_category)
# Here we are done with the high level Baustein
req_rows[req_index] = req_info
# Here we add the low level Baustein from the sentences
req_sentences = get_req_paragraph_text(requirement)
for ctr, sub_req in enumerate(req_sentences):
sub_req_info = []
sub_req_info.append(sub_req)
sub_req_info.append(req_owner)
sub_req_info.append(req_category)
req_rows[f"{req_index}-{ctr+1}"] = sub_req_info
# print(req_rows)
return req_rows
def get_req_paragraph_text(requirement: BeautifulSoup) -> list[str]:
# Here we get rid of the module title first, as it is included in the text
sentences = re.split(r"\n+", requirement.parent.text.strip(), 1)[1]
# Here we take care of the abbreviations, so they don't cause trouble when we look for sentences
mapping = comp_conv_config["ABBREV_REPLACEMENTS"]
for key, value in mapping.items():
sentences = sentences.replace(key, value)
# Here we try to seperate sentences from eachother
sentences = re.split(r"(?<=[^A-Z].[.?!])([\n\t]* |[\n\t]*)+(?=[A-Z])", sentences)
# From the regex we get empty resulsts, so we delete them
while "" in sentences:
sentences.remove("")
# Here we get rid of unnecessary new lines to shorten the text
sentences = [" ".join(sentence.split()) for sentence in sentences]
# Here we reverse the dots for the abbreviations
for key, value in mapping.items():
sentences = [sentence.replace(value, key) for sentence in sentences]
# print(requirement.parent.text)
# print(sentences)
return sentences
def get_req_category(requirement: BeautifulSoup) -> str:
try:
# We search for the parent node of the parent node of the requirement title node for the category
req_category = requirement.parent.parent.find_next("title").text.split("-")[0]
except AttributeError:
log.error("We did not find the correct category, setting it to [DEFAULT]...")
req_category = "[DEFAULT]"
if " " in req_category:
# Here we alter the category for elevated category as there is nothing to split there
req_category = "erhöht"
# print(req_category)
return req_category
def get_req_owner(requirement: BeautifulSoup) -> str:
# print(f"requirement parent: {requirement.parent.parent}")
try:
req_owner = (
requirement.find_previous("informaltable")
.tbody.find("para", text="Grundsätzlich zuständig")
.find_next("para")
.text
# .parent.find_next_sibling()
# .para.text
)
except AttributeError:
log.debug(
"We have the text in an emphasis in the XML-Format, so we search for the emphasis text instead of the paragraph."
)
try:
req_owner = (
requirement.find_previous("informaltable")
.tbody.find("emphasis", text="Grundsätzlich zuständig")
.find_next("emphasis")
.text
# .parent.parent.find_next_sibling()
# .para.emphasis.text
)
except AttributeError:
log.error(
"We have something strange ongoing and can not find the responsible person, so we use [DEFAULT], set it manually later!"
)
req_owner = "[DEFAULT]"
log.debug(f"owner: {req_owner} for {requirement.text}")
return req_owner
def create_front_page(writer: pd.ExcelWriter, sheetname: str) -> None:
# wb = writer.book
# ws = writer.add_worksheet(sheetname)
pass
def write_file(writer: pd.ExcelWriter, sheetname: str, data: pd.DataFrame) -> None:
log.info(f"Nr. of entries: {data['Anf. Nr.'].size}")
data.to_excel(writer, sheetname, startrow=1, header=False, index=False)
log.info("Data written into excel.")
wb = writer.book
ws = writer.sheets[sheetname]
multiline_style = wb.add_format({"text_wrap": True})
singleline_style = wb.add_format({"valign": "vcenter"})
ws.set_column("A:A", 15, singleline_style) # Kategorie
ws.set_column("B:B", 5, singleline_style) # CIA
ws.set_column("C:C", 20, singleline_style) # Anf. Nr.
ws.set_column("D:D", 45, multiline_style) # Titel
ws.set_column("E:E", 30, multiline_style) # Verantwortung
ws.set_column("F:F", 50, multiline_style) # Umsetzungsbeschreibung
ws.set_column("G:G", 40, singleline_style) # Referenziertes Dokument
ws.set_column("H:H", 12, multiline_style) # Status
ws.set_column("I:I", 10, multiline_style) # Risiko
ws.set_column(
"J:K", 25, multiline_style
) # Risikobeschreibung | Gefährdungszuordnung
header_format = wb.add_format(
{"bg_color": "#002060", "bold": True, "font_color": "white"}
)
for col_num, value in enumerate(data.columns.values):
ws.write(0, col_num, value, header_format)
ws.autofilter("A1:K1")
ws.freeze_panes(1, 0)
log.info("Done writing the file")
def create_threats_page(writer, sheetname: str) -> None:
pass
def read_prev_version(filename: str) -> pd.DataFrame:
"""
Reads in the previous version of the Grundschutz Baustein to check for any changes in the current version
:param filename: Excelfile containing the Grundschutz Baustein, what we have to read in
:return: Returns a dataframe with the relevant info from the file
"""
pass
if __name__ == "__main__":
main()
CON seems to be a reserved name on Windows (see Microsoft Forum entry ), that's why the OS won't allow any filenames starting with CON. CON 似乎是Windows上的保留名称(参见Microsoft 论坛条目),这就是操作系统不允许任何以 CON 开头的文件名的原因。 Instead in this case it writes the content of the file to the debug console.
相反,在这种情况下,它将文件的内容写入调试控制台。
@jmcnamara that explains why you had no issues on Mac. @jmcnamara 解释了为什么您在 Mac 上没有问题。 Again thank you very much for your time and efforts for checking for any bugs and also HUGE thank you for the great package you are maintaining!
再次非常感谢您花时间和精力检查任何错误,也非常感谢您维护的 package!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.