简体   繁体   English

Python CSV 到字典,每 1 项包含多个行条目

[英]Python CSV to dictionary with multiple row entries per 1 item

I am using python to turn a CSV file into a dictionary, where the CSV file has multiple values for the same column.我正在使用 python 将 CSV 文件转换为字典,其中 CSV 文件具有同一列的多个值。

The following works to use the CSV headers (first line) as the named key to turn a simple CSV without multiple values into a dictionary:以下工作使用 CSV 标头(第一行)作为命名键,将没有多个值的简单 CSV 转换为字典:

def main():
    content = csvArray(".../Csv.csv")
    print(content)


def csvArray(path): 
    df = pd.read_csv(path)
    records = df.to_dict(orient='records')
    return records

However, I now have an issue.但是,我现在有一个问题。 There is an Image column in the CSV, and in many cases, there are multiple entries per column for 1 item, formatted like: CSV 中有一个 Image 列,在许多情况下,每列有多个条目对应 1 个项目,格式如下:

SKU库存单位 ImageData图像数据
12345 12345 1st Image Data第一张图像数据
2nd Image Data第二图像数据
3rd Image Data第三张图像数据
12346 12346 1st Image Data第一张图像数据
2nd Image Data第二图像数据

etc... ETC...

There can be anywhere up to 8 images for 1 SKU. 1 个 SKU 最多可以有 8 张图片。

My csvArray function does not work with the CSV formatted as such, and changing the format of the CSV is not possible from the export.我的 csvArray 函数不适用于这样格式化的 CSV,并且无法从导出中更改 CSV 的格式。

How could I concatenate all the image data into the first row?如何将所有图像数据连接到第一行? Or any alternative that could work turning the CSV into a dictionary?或者任何可以将 CSV 转换为字典的替代方法?

Data from your comment to your question:您对问题的评论中的数据:

s = '''Internal Reference;Name;Extra Product Media/Image TGTLI20018;20V Grass Trimmer - Body only;1st Image base64 data ;;2nd Image base64 data ;;3rd Image base64 data ;;4th Image base64 data ;;5th Image base64 data TGTLI20019;25V Grass Trimmer;1st Image base64 data ;;2nd Image base64 data'''

If you can determine a pattern that delineates records and will not occur in the base64 image data like ...如果您可以确定一种描述记录的模式并且不会出现在 base64 图像数据中,例如...

pattern = ' TGTLI'
  • find all the indices of this pattern in the data - (49, 208) in this case在数据中找到此模式的所有索引 - 在这种情况下为(49, 208)

  • iterate over the indices in (overlapping pairs) and use them to slice the data迭代(重叠对)中的索引并使用它们对数据进行切片

    record = s[49:208]
  • split the record with semicolon用分号分割记录

>>> s[49:208].split(';')
[' TGTLI20018', '20V Grass Trimmer - Body only', '1st Image base64 data ', '', '2nd Image base64 data ', '', '3rd Image base64 data ', '', '4th Image base64 data ', '', '5th Image base64 data']
  • extract the fields and make the dictionary.提取字段并制作字典。

How to find all occurrences of a substring? 如何找到所有出现的子字符串?
Iterate a list as pair (current, next) in Python 在Python中将列表迭代为对(当前,下一个)

many more of those examples/Q&A's searching here on SO.更多这些示例/问答在 SO 上搜索。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM