![](/img/trans.png)
[英]Add a "|" symbol while grouping a data frame by multiple columns with python pandas
[英]Split one column in CSV file into multiple columns while grouping the data in Python (without Pandas)
我目前正在學習Python,並希望就我的問題之一尋求幫助。 我有一個 ”;” 分隔的文件(如下所示),我正在嘗試刷出並提取excel和csv格式的一些數據。
我的原始CSV文件。
COUNTRY COUNTRY_TIME COUNTRY_REF PRODUCT
FRANCE FRANCE20180222.16.30.00 FRANCE20180221 APPLE%BOX%LYON%022018
FRANCE FRANCE20180222.16.30.00 FRANCE20180221 APPLE%BOX%LYON%032018
FRANCE FRANCE20180222.16.30.00 FRANCE20180221 APPLE%BOX%LYON%052018
FRANCE FRANCE20180222.16.30.00 FRANCE20180221 APPLE%BOX%LYON%062018
FRANCE FRANCE20180222.16.30.00 FRANCE20180221 APPLE%BOX%NICE%032018
FRANCE FRANCE20180222.16.30.00 FRANCE20180221 APPLE%BOX%LILLE%022018
FRANCE FRANCE20180222.16.30.00 FRANCE20180221 APPLE%BOX%NEM%022018
FRANCE FRANCE20180222.16.30.00 FRANCE20180221 APPLE%COVER%CWF%022018
FRANCE FRANCE20180222.16.30.00 FRANCE20180221 APPLE%COVER%FZF%022018
FRANCE FRANCE20180222.16.30.00 FRANCE20180221 APPLE%COVER%MX1%022018
FRANCE FRANCE20180222.16.30.00 FRANCE20180221 APPLE%BIGBOX%DIJON%022018
SWEDEN SWEDEN20180223.02.11.00 SWEDEN20180222 APPLE%SMALLBOX%BODEN%012019
SWEDEN SWEDEN20180223.02.11.00 SWEDEN20180222 APPLE%SMALLBOX%BODEN%022019
SWEDEN SWEDEN20180223.02.11.00 SWEDEN20180222 APPLE%SMALLBOX%BODEN%032018
SWEDEN SWEDEN20180223.02.11.00 SWEDEN20180222 APPLE%SMALLBOX%BODEN%042018
SWEDEN SWEDEN20180223.02.11.00 SWEDEN20180222 APPLE%SMALLBOX%BODEN%052018
SWEDEN SWEDEN20180223.02.11.00 SWEDEN20180222 APPLE%SMALLBOX%BODEN%062018
SWEDEN SWEDEN20180223.02.11.00 SWEDEN20180222 APPLE%SMALLBOX%FLEN%012019
SWEDEN SWEDEN20180223.02.11.00 SWEDEN20180222 APPLE%SMALLBOX%FLEN%032018
SWEDEN SWEDEN20180223.02.11.00 SWEDEN20180222 APPLE%SMALLBOX%FLEN%042018
SWEDEN SWEDEN20180223.02.11.00 SWEDEN20180222 APPLE%SMALLBOX%FLEN%052018
SWEDEN SWEDEN20180223.02.11.00 SWEDEN20180222 APPLE%SMALLBOX%FLEN%062018
我最終的預期數據應該是
COUNTRY EXCHANGE_CODE TOWN_CODE MONTH_CODE
FRANCE BOX LYON 022018;032018;052018;062018
FRANCE BOX NICE 032018
FRANCE BOX LILLE 022018
FRANCE BOX NEM 022018
FRANCE COVER CWF 022018
FRANCE COVER FZF 022018
FRANCE COVER MX1 022018
FRANCE BIGBOX DIJON 022018
SWEDEN SMALLBOX BODEN 012019;022019;032018;042018;052018;062018
SWEDEN SMALLBOX FLEN 012019;032018;042018;052018;062018
我創建了下面的腳本,但只能實現到下面的表。
import csv
import os
from collections import defaultdict, OrderedDict
import itertools
from operator import itemgetter
in_path = os.path.expanduser("~/Desktop/FUTURES.csv")
out_path = os.path.expanduser("~/Desktop/Finalresult.csv")
with open(in_path, 'r') as f_in, open(out_path, 'w', newline='') as f_out:
csv_reader = csv.reader(f_in, delimiter=';')
writer = csv.writer(f_out)
all = []
row = next(csv_reader)
row.append('LFU')
row.append('EXCHANGE_CODE')
row.append('TOWN_CODE')
row.append('MONTH_CODE')
all.append(row)
for row in csv_reader:
if row[0] in ['FRANCE', 'SWEDEN']:
row.append(row[3].split('%')[0])
row.append(row[3].split('%')[1])
row.append(row[3].split('%')[2])
row.append(row[3].split('%')[3])
all.append(row)
writer.writerows(map(itemgetter(0, 5, 6, 7), all))
我目前的結果
COUNTRY FRUIT EXCHANGE_CODE TOWN_CODE MONTH_CODE
FRANCE APPLE BOX LYON 022018
FRANCE APPLE BOX LYON 032018
FRANCE APPLE BOX LYON 052018
FRANCE APPLE BOX LYON 062018
FRANCE APPLE BOX NICE 032018
FRANCE APPLE BOX LILLE 022018
FRANCE APPLE BOX NEM 022018
FRANCE APPLE COVER CWF 022018
FRANCE APPLE COVER FZF 022018
FRANCE APPLE COVER MX1 022018
FRANCE APPLE BIGBOX DIJON 022018
SWEDEN APPLE SMALLBOX BODEN 012019
SWEDEN APPLE SMALLBOX BODEN 022019
SWEDEN APPLE SMALLBOX BODEN 032018
SWEDEN APPLE SMALLBOX BODEN 042018
SWEDEN APPLE SMALLBOX BODEN 052018
SWEDEN APPLE SMALLBOX BODEN 062018
SWEDEN APPLE SMALLBOX FLEN 012019
SWEDEN APPLE SMALLBOX FLEN 032018
SWEDEN APPLE SMALLBOX FLEN 042018
SWEDEN APPLE SMALLBOX FLEN 052018
SWEDEN APPLE SMALLBOX FLEN 062018
我將非常感謝我能提供的任何幫助。
PS-我不想使用Numpy的Pandas。
如果要忽略所有庫,這是不導入的解決方案:
with open('smntg.csv') as fin, open('smntg_else.csv', 'w') as fout:
header = ['COUNTRY', 'EXCHANGE_CODE', 'TOWN_CODE', 'MONTH_CODE']
data = fin.readlines()
needed = list(map(str.strip, data))[1:]
dealtWith = []
for line in needed:
apart = line.split(';')
country = apart[0]
exchange, town, month = apart[-1].split('%')[1:]
dealtWith.append([country, exchange, town, month])
packed = {tuple(dealtWith[0][:3]): [dealtWith[0][3]]}
for item in dealtWith[1:]:
key = tuple(item[:3])
value = item[3]
if key in packed:
packed[key].append(value)
else:
packed[key] = [value]
joined = {k: ';'.join(v) for k, v in packed.items()}
finalized = [list(i) + [j] for i, j in joined.items()]
finalized.sort()
commaDelimited = [','.join(fline) + '\n' for fline in finalized]
fout.write(','.join(header) + '\n')
fout.writelines(commaDelimited)
您不能為每條讀取線寫一行,因為一條輸出線可以由許多輸入線組成。 但是,如果您可以假設輸入文件是按照COUNTRY,EXCHANGE_CODE和TOWN排序的,那么如果COUNTRY,EXCHANGE_CODE和TOWN相同,則只需在前一行的末尾添加新的月份即可。
您的代碼可能變為:
...
with open(in_path, 'r') as f_in, open(out_path, 'w', newline='') as f_out:
csv_reader = csv.reader(f_in, delimiter=';')
writer = csv.writer(f_out)
all = []
row = next(csv_reader)
row.append('LFU')
row.append('EXCHANGE_CODE')
row.append('TOWN_CODE')
row.append('MONTH_CODE')
old = row # just remember it
for row in csv_reader:
if row[0] in ['FRANCE', 'SWEDEN']:
row.append(row[3].split('%')[0])
row.append(row[3].split('%')[1])
row.append(row[3].split('%')[2])
row.append(row[3].split('%')[3])
if row[0] == old[0] and row[5] == old[5] and row[6] == old[6]:
old[7] += ';' + row[7]
else:
all.append(old) # write down previous row
old = row
all.append(old) # do not forget last row
writer.writerows(map(itemgetter(0, 5, 6, 7), all))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.