简体   繁体   English

将python文件中的数据分组时,将CSV文件中的一列拆分为多列(无熊猫)

[英]Split one column in CSV file into multiple columns while grouping the data in Python (without Pandas)

I am currently learning Python and would request some help with one of my question. 我目前正在学习Python,并希望就我的问题之一寻求帮助。 I have a ";" 我有一个 ”;” separated file (given below) which I am trying to brush and extract some data in excel and csv format. 分隔的文件(如下所示),我正在尝试刷出并提取excel和csv格式的一些数据。

My Raw CSV file.. 我的原始CSV文件。

    COUNTRY  COUNTRY_TIME            COUNTRY_REF        PRODUCT
    FRANCE  FRANCE20180222.16.30.00  FRANCE20180221     APPLE%BOX%LYON%022018
    FRANCE  FRANCE20180222.16.30.00  FRANCE20180221     APPLE%BOX%LYON%032018
    FRANCE  FRANCE20180222.16.30.00  FRANCE20180221     APPLE%BOX%LYON%052018
    FRANCE  FRANCE20180222.16.30.00  FRANCE20180221     APPLE%BOX%LYON%062018
    FRANCE  FRANCE20180222.16.30.00  FRANCE20180221     APPLE%BOX%NICE%032018
    FRANCE  FRANCE20180222.16.30.00  FRANCE20180221    APPLE%BOX%LILLE%022018
    FRANCE  FRANCE20180222.16.30.00  FRANCE20180221    APPLE%BOX%NEM%022018
    FRANCE  FRANCE20180222.16.30.00  FRANCE20180221    APPLE%COVER%CWF%022018
    FRANCE  FRANCE20180222.16.30.00  FRANCE20180221   APPLE%COVER%FZF%022018
    FRANCE  FRANCE20180222.16.30.00  FRANCE20180221   APPLE%COVER%MX1%022018
    FRANCE  FRANCE20180222.16.30.00  FRANCE20180221 APPLE%BIGBOX%DIJON%022018
    SWEDEN  SWEDEN20180223.02.11.00  SWEDEN20180222 APPLE%SMALLBOX%BODEN%012019
    SWEDEN  SWEDEN20180223.02.11.00  SWEDEN20180222 APPLE%SMALLBOX%BODEN%022019
    SWEDEN  SWEDEN20180223.02.11.00  SWEDEN20180222 APPLE%SMALLBOX%BODEN%032018
    SWEDEN  SWEDEN20180223.02.11.00  SWEDEN20180222 APPLE%SMALLBOX%BODEN%042018
    SWEDEN  SWEDEN20180223.02.11.00  SWEDEN20180222 APPLE%SMALLBOX%BODEN%052018
    SWEDEN  SWEDEN20180223.02.11.00  SWEDEN20180222 APPLE%SMALLBOX%BODEN%062018
    SWEDEN  SWEDEN20180223.02.11.00  SWEDEN20180222 APPLE%SMALLBOX%FLEN%012019
    SWEDEN  SWEDEN20180223.02.11.00  SWEDEN20180222 APPLE%SMALLBOX%FLEN%032018
    SWEDEN  SWEDEN20180223.02.11.00  SWEDEN20180222 APPLE%SMALLBOX%FLEN%042018
    SWEDEN  SWEDEN20180223.02.11.00  SWEDEN20180222 APPLE%SMALLBOX%FLEN%052018
    SWEDEN  SWEDEN20180223.02.11.00  SWEDEN20180222 APPLE%SMALLBOX%FLEN%062018

My final expected data should be like, 我最终的预期数据应该是

COUNTRY EXCHANGE_CODE   TOWN_CODE   MONTH_CODE
FRANCE  BOX              LYON       022018;032018;052018;062018
FRANCE  BOX              NICE       032018
FRANCE  BOX              LILLE      022018
FRANCE  BOX              NEM        022018
FRANCE  COVER            CWF        022018
FRANCE  COVER            FZF        022018
FRANCE  COVER            MX1        022018
FRANCE  BIGBOX           DIJON      022018
SWEDEN  SMALLBOX         BODEN      012019;022019;032018;042018;052018;062018
SWEDEN  SMALLBOX         FLEN       012019;032018;042018;052018;062018

I have created the below script but was only able to achieve till the below given table. 我创建了下面的脚本,但只能实现到下面的表。

import csv
import os
from collections import defaultdict, OrderedDict
import itertools
from operator import itemgetter

in_path = os.path.expanduser("~/Desktop/FUTURES.csv")
out_path = os.path.expanduser("~/Desktop/Finalresult.csv")

with open(in_path, 'r') as f_in, open(out_path, 'w', newline='') as f_out:
    csv_reader = csv.reader(f_in, delimiter=';')
    writer = csv.writer(f_out)

    all = []
    row = next(csv_reader)
    row.append('LFU')
    row.append('EXCHANGE_CODE')
    row.append('TOWN_CODE')
    row.append('MONTH_CODE')
    all.append(row)

    for row in csv_reader:
        if row[0] in ['FRANCE', 'SWEDEN']:

            row.append(row[3].split('%')[0])
            row.append(row[3].split('%')[1])
            row.append(row[3].split('%')[2])
            row.append(row[3].split('%')[3])
            all.append(row)

    writer.writerows(map(itemgetter(0, 5, 6, 7), all))

My current result.. 我目前的结果

COUNTRY FRUIT   EXCHANGE_CODE   TOWN_CODE   MONTH_CODE
FRANCE  APPLE   BOX                 LYON    022018
FRANCE  APPLE   BOX                 LYON    032018
FRANCE  APPLE   BOX                 LYON    052018
FRANCE  APPLE   BOX                 LYON    062018
FRANCE  APPLE   BOX                 NICE    032018
FRANCE  APPLE   BOX                 LILLE   022018
FRANCE  APPLE   BOX                 NEM     022018
FRANCE  APPLE   COVER               CWF     022018
FRANCE  APPLE   COVER               FZF     022018
FRANCE  APPLE   COVER               MX1     022018
FRANCE  APPLE   BIGBOX              DIJON   022018
SWEDEN  APPLE   SMALLBOX            BODEN   012019
SWEDEN  APPLE   SMALLBOX            BODEN   022019
SWEDEN  APPLE   SMALLBOX            BODEN   032018
SWEDEN  APPLE   SMALLBOX            BODEN   042018
SWEDEN  APPLE   SMALLBOX            BODEN   052018
SWEDEN  APPLE   SMALLBOX            BODEN   062018
SWEDEN  APPLE   SMALLBOX            FLEN    012019
SWEDEN  APPLE   SMALLBOX            FLEN    032018
SWEDEN  APPLE   SMALLBOX            FLEN    042018
SWEDEN  APPLE   SMALLBOX            FLEN    052018
SWEDEN  APPLE   SMALLBOX            FLEN    062018

I would really appreciate any help that I can get. 我将非常感谢我能提供的任何帮助。

PS - I don't want to use Pandas, Numpy. PS-我不想使用Numpy的Pandas。

If you want to omit all the libraries, here is a solution without imports: 如果要忽略所有库,这是不导入的解决方案:

with open('smntg.csv') as fin, open('smntg_else.csv', 'w') as fout:
    header = ['COUNTRY', 'EXCHANGE_CODE', 'TOWN_CODE', 'MONTH_CODE']
    data = fin.readlines()
    needed = list(map(str.strip, data))[1:]   
    dealtWith = []
    for line in needed:
        apart = line.split(';')
        country = apart[0]
        exchange, town, month = apart[-1].split('%')[1:]
        dealtWith.append([country, exchange, town, month])        
    packed = {tuple(dealtWith[0][:3]): [dealtWith[0][3]]}
    for item in dealtWith[1:]:
        key = tuple(item[:3])
        value = item[3]
        if key in packed:
            packed[key].append(value)
        else:
            packed[key] = [value]
    joined = {k: ';'.join(v) for k, v in packed.items()}
    finalized = [list(i) + [j] for i, j in joined.items()]
    finalized.sort()
    commaDelimited = [','.join(fline) + '\n' for fline in finalized]
    fout.write(','.join(header) + '\n')
    fout.writelines(commaDelimited)

You cannot write a line for each read line, because one single output line can be composed from a number of input line. 您不能为每条读取线写一行,因为一条输出线可以由许多输入线组成。 But if you can assume that input file is sorted according to COUNTRY, EXCHANGE_CODE and TOWN, you can just add the new month at the end of the one of the previous line if COUNTRY, EXCHANGE_CODE and TOWN are the same. 但是,如果您可以假设输入文件是按照COUNTRY,EXCHANGE_CODE和TOWN排序的,那么如果COUNTRY,EXCHANGE_CODE和TOWN相同,则只需在前一行的末尾添加新的月份即可。

Your code could become: 您的代码可能变为:

...
with open(in_path, 'r') as f_in, open(out_path, 'w', newline='') as f_out:
    csv_reader = csv.reader(f_in, delimiter=';')
    writer = csv.writer(f_out)

    all = []
    row = next(csv_reader)
    row.append('LFU')
    row.append('EXCHANGE_CODE')
    row.append('TOWN_CODE')
    row.append('MONTH_CODE')

    old = row                       # just remember it

    for row in csv_reader:
        if row[0] in ['FRANCE', 'SWEDEN']:

            row.append(row[3].split('%')[0])
            row.append(row[3].split('%')[1])
            row.append(row[3].split('%')[2])
            row.append(row[3].split('%')[3])
            if row[0] == old[0] and row[5] == old[5] and row[6] == old[6]:
                old[7] += ';' + row[7]
            else:
                all.append(old)                      # write down previous row
                old = row
    all.append(old)                                  # do not forget last row

    writer.writerows(map(itemgetter(0, 5, 6, 7), all))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM