[英]Split one column in CSV file into multiple columns while grouping the data in Python (without Pandas)
I am currently learning Python and would request some help with one of my question. 我目前正在学习Python,并希望就我的问题之一寻求帮助。 I have a ";" 我有一个 ”;” separated file (given below) which I am trying to brush and extract some data in excel and csv format. 分隔的文件(如下所示),我正在尝试刷出并提取excel和csv格式的一些数据。
My Raw CSV file.. 我的原始CSV文件。
COUNTRY COUNTRY_TIME COUNTRY_REF PRODUCT
FRANCE FRANCE20180222.16.30.00 FRANCE20180221 APPLE%BOX%LYON%022018
FRANCE FRANCE20180222.16.30.00 FRANCE20180221 APPLE%BOX%LYON%032018
FRANCE FRANCE20180222.16.30.00 FRANCE20180221 APPLE%BOX%LYON%052018
FRANCE FRANCE20180222.16.30.00 FRANCE20180221 APPLE%BOX%LYON%062018
FRANCE FRANCE20180222.16.30.00 FRANCE20180221 APPLE%BOX%NICE%032018
FRANCE FRANCE20180222.16.30.00 FRANCE20180221 APPLE%BOX%LILLE%022018
FRANCE FRANCE20180222.16.30.00 FRANCE20180221 APPLE%BOX%NEM%022018
FRANCE FRANCE20180222.16.30.00 FRANCE20180221 APPLE%COVER%CWF%022018
FRANCE FRANCE20180222.16.30.00 FRANCE20180221 APPLE%COVER%FZF%022018
FRANCE FRANCE20180222.16.30.00 FRANCE20180221 APPLE%COVER%MX1%022018
FRANCE FRANCE20180222.16.30.00 FRANCE20180221 APPLE%BIGBOX%DIJON%022018
SWEDEN SWEDEN20180223.02.11.00 SWEDEN20180222 APPLE%SMALLBOX%BODEN%012019
SWEDEN SWEDEN20180223.02.11.00 SWEDEN20180222 APPLE%SMALLBOX%BODEN%022019
SWEDEN SWEDEN20180223.02.11.00 SWEDEN20180222 APPLE%SMALLBOX%BODEN%032018
SWEDEN SWEDEN20180223.02.11.00 SWEDEN20180222 APPLE%SMALLBOX%BODEN%042018
SWEDEN SWEDEN20180223.02.11.00 SWEDEN20180222 APPLE%SMALLBOX%BODEN%052018
SWEDEN SWEDEN20180223.02.11.00 SWEDEN20180222 APPLE%SMALLBOX%BODEN%062018
SWEDEN SWEDEN20180223.02.11.00 SWEDEN20180222 APPLE%SMALLBOX%FLEN%012019
SWEDEN SWEDEN20180223.02.11.00 SWEDEN20180222 APPLE%SMALLBOX%FLEN%032018
SWEDEN SWEDEN20180223.02.11.00 SWEDEN20180222 APPLE%SMALLBOX%FLEN%042018
SWEDEN SWEDEN20180223.02.11.00 SWEDEN20180222 APPLE%SMALLBOX%FLEN%052018
SWEDEN SWEDEN20180223.02.11.00 SWEDEN20180222 APPLE%SMALLBOX%FLEN%062018
My final expected data should be like, 我最终的预期数据应该是
COUNTRY EXCHANGE_CODE TOWN_CODE MONTH_CODE
FRANCE BOX LYON 022018;032018;052018;062018
FRANCE BOX NICE 032018
FRANCE BOX LILLE 022018
FRANCE BOX NEM 022018
FRANCE COVER CWF 022018
FRANCE COVER FZF 022018
FRANCE COVER MX1 022018
FRANCE BIGBOX DIJON 022018
SWEDEN SMALLBOX BODEN 012019;022019;032018;042018;052018;062018
SWEDEN SMALLBOX FLEN 012019;032018;042018;052018;062018
I have created the below script but was only able to achieve till the below given table. 我创建了下面的脚本,但只能实现到下面的表。
import csv
import os
from collections import defaultdict, OrderedDict
import itertools
from operator import itemgetter
in_path = os.path.expanduser("~/Desktop/FUTURES.csv")
out_path = os.path.expanduser("~/Desktop/Finalresult.csv")
with open(in_path, 'r') as f_in, open(out_path, 'w', newline='') as f_out:
csv_reader = csv.reader(f_in, delimiter=';')
writer = csv.writer(f_out)
all = []
row = next(csv_reader)
row.append('LFU')
row.append('EXCHANGE_CODE')
row.append('TOWN_CODE')
row.append('MONTH_CODE')
all.append(row)
for row in csv_reader:
if row[0] in ['FRANCE', 'SWEDEN']:
row.append(row[3].split('%')[0])
row.append(row[3].split('%')[1])
row.append(row[3].split('%')[2])
row.append(row[3].split('%')[3])
all.append(row)
writer.writerows(map(itemgetter(0, 5, 6, 7), all))
My current result.. 我目前的结果
COUNTRY FRUIT EXCHANGE_CODE TOWN_CODE MONTH_CODE
FRANCE APPLE BOX LYON 022018
FRANCE APPLE BOX LYON 032018
FRANCE APPLE BOX LYON 052018
FRANCE APPLE BOX LYON 062018
FRANCE APPLE BOX NICE 032018
FRANCE APPLE BOX LILLE 022018
FRANCE APPLE BOX NEM 022018
FRANCE APPLE COVER CWF 022018
FRANCE APPLE COVER FZF 022018
FRANCE APPLE COVER MX1 022018
FRANCE APPLE BIGBOX DIJON 022018
SWEDEN APPLE SMALLBOX BODEN 012019
SWEDEN APPLE SMALLBOX BODEN 022019
SWEDEN APPLE SMALLBOX BODEN 032018
SWEDEN APPLE SMALLBOX BODEN 042018
SWEDEN APPLE SMALLBOX BODEN 052018
SWEDEN APPLE SMALLBOX BODEN 062018
SWEDEN APPLE SMALLBOX FLEN 012019
SWEDEN APPLE SMALLBOX FLEN 032018
SWEDEN APPLE SMALLBOX FLEN 042018
SWEDEN APPLE SMALLBOX FLEN 052018
SWEDEN APPLE SMALLBOX FLEN 062018
I would really appreciate any help that I can get. 我将非常感谢我能提供的任何帮助。
PS - I don't want to use Pandas, Numpy. PS-我不想使用Numpy的Pandas。
If you want to omit all the libraries, here is a solution without imports: 如果要忽略所有库,这是不导入的解决方案:
with open('smntg.csv') as fin, open('smntg_else.csv', 'w') as fout:
header = ['COUNTRY', 'EXCHANGE_CODE', 'TOWN_CODE', 'MONTH_CODE']
data = fin.readlines()
needed = list(map(str.strip, data))[1:]
dealtWith = []
for line in needed:
apart = line.split(';')
country = apart[0]
exchange, town, month = apart[-1].split('%')[1:]
dealtWith.append([country, exchange, town, month])
packed = {tuple(dealtWith[0][:3]): [dealtWith[0][3]]}
for item in dealtWith[1:]:
key = tuple(item[:3])
value = item[3]
if key in packed:
packed[key].append(value)
else:
packed[key] = [value]
joined = {k: ';'.join(v) for k, v in packed.items()}
finalized = [list(i) + [j] for i, j in joined.items()]
finalized.sort()
commaDelimited = [','.join(fline) + '\n' for fline in finalized]
fout.write(','.join(header) + '\n')
fout.writelines(commaDelimited)
You cannot write a line for each read line, because one single output line can be composed from a number of input line. 您不能为每条读取线写一行,因为一条输出线可以由许多输入线组成。 But if you can assume that input file is sorted according to COUNTRY, EXCHANGE_CODE and TOWN, you can just add the new month at the end of the one of the previous line if COUNTRY, EXCHANGE_CODE and TOWN are the same. 但是,如果您可以假设输入文件是按照COUNTRY,EXCHANGE_CODE和TOWN排序的,那么如果COUNTRY,EXCHANGE_CODE和TOWN相同,则只需在前一行的末尾添加新的月份即可。
Your code could become: 您的代码可能变为:
...
with open(in_path, 'r') as f_in, open(out_path, 'w', newline='') as f_out:
csv_reader = csv.reader(f_in, delimiter=';')
writer = csv.writer(f_out)
all = []
row = next(csv_reader)
row.append('LFU')
row.append('EXCHANGE_CODE')
row.append('TOWN_CODE')
row.append('MONTH_CODE')
old = row # just remember it
for row in csv_reader:
if row[0] in ['FRANCE', 'SWEDEN']:
row.append(row[3].split('%')[0])
row.append(row[3].split('%')[1])
row.append(row[3].split('%')[2])
row.append(row[3].split('%')[3])
if row[0] == old[0] and row[5] == old[5] and row[6] == old[6]:
old[7] += ';' + row[7]
else:
all.append(old) # write down previous row
old = row
all.append(old) # do not forget last row
writer.writerows(map(itemgetter(0, 5, 6, 7), all))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.