简体   繁体   English

将列表腌制为 UTF-8

[英]pickle a list as UTF-8

I want to import all files from one directory to my sql.我想将一个目录中的所有文件导入我的 sql。 But I have to make the same changes to each original.htb file first.但我必须先对每个 original.htb 文件进行相同的更改。 The problem with the original file is that原始文件的问题是

  1. I don't want to import the column headers and the 2nd line because its blank我不想导入列标题和第二行,因为它是空白的

  2. I need to change \t\t\t\n to only \n so MySQL knows where fields and lines end我需要将 \t\t\t\n 更改为仅 \n 以便 MySQL 知道字段和行的结束位置

  3. I need to remove -----\n because it only has 1 column which doesn't match my tabe (4 columns) Here's how the original.htb file looks like:我需要删除 -----\n 因为它只有 1 列与我的表不匹配(4 列) 以下是 original.htb 文件的样子:

    Beschreibung\t Kurzbeschreibung\t Einheit\t Wert\t\t\t\n Beschreibung\t Kurzbeschreibung\t Einheit\t Wert\t\t\t\n

    \n \n

    Hub\t Hub\t mm\t 150.000000000000\t\t\t\n集线器\t 集线器\t 毫米\t 150.000000000000\t\t\t\n

    Bohrung\t Bohru\t mm\t 135.000000000000\t\t\t\n Bohrung\t Bohru\t mm\t 135.000000000000\t\t\t\n

    -----\n -----\n

so far I have managed to create a list of all files.到目前为止,我已经设法创建了所有文件的列表。 My next step would be to write that list to 1 single file which I can then edit.我的下一步是将该列表写入 1 个单个文件,然后我可以对其进行编辑。 The problem I have is that I get a format issue when I save the list do a file.我遇到的问题是当我将列表保存为文件时出现格式问题。 I want the final file to have utf8 format.我希望最终文件具有 utf8 格式。 this is what I want my file to look like:这就是我希望我的文件看起来的样子:

Hub Hub mm  150.000000000000            
Bohrung Bohru   mm  135.000000000000            

but what I get at the moment is:但我现在得到的是:

”ŒHub   Hub mm  150.000000000000            
”Œ%Bohrung  Bohru   mm  135.000000000000        

Here's my code:这是我的代码:

import os
import pickle

folderpath = r"C:/Users/l-reh/Desktop/HTB" 
filepaths  = [os.path.join("C:/Users/l-reh/Desktop/HTB/", name) for name in os.listdir(folderpath)]
all_files = []

for path in filepaths:
    with open(path, 'r') as f:
        file = f.readlines()
        all_files.append(file)

with open("C:/Users/l-reh/Desktop/Bachelorarbeit/DB Testdatensatz/HTB.htb", 'wb') as f:
    pickle.dump(all_files, f)

pickle produces a binary format, which includes per field "header" bytes (describing type, length, and for some pickle protocols, framing data) that are going to look like garbage text if you view the output as text. pickle生成一种二进制格式,其中包括每个字段的“标头”字节(描述类型、长度以及对于某些 pickle 协议的帧数据),如果您将 output 视为文本,这些字节将看起来像垃圾文本。 You can't say "I want it to be pickle , but not have these bytes" because those bytes are part of the pickle serialization format.您不能说“我希望它是pickle ,但没有这些字节”,因为这些字节是pickle序列化格式的一部分。 If you don't want those bytes, you need to choose a different serialization format (presumably using a custom serializer that matches this HTB format).如果您不想要这些字节,则需要选择不同的序列化格式(大概使用与此 HTB 格式匹配的自定义序列化程序)。 This has nothing to do with UTF-8 encoding or lack thereof (your input is ASCII), the problem is that you are demanding a result that's literally impossible within the limits of your design.这与 UTF-8 编码或缺少编码无关(您的输入是 ASCII),问题是您要求的结果在您的设计范围内实际上是不可能的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM