如何使用Python查找基于同一列表的唯一值的列表元素的最小值

Question

我有一个如下的csv

SKU;price;availability;Title;Supplier
SUV500;21,50 €;1;27-03-2019 14:46;supplier1
MZ-76E;5,50 €;1;27-03-2019 14:46;supplier1
SUV500;49,95 €;0;27-03-2019 14:46;supplier2
MZ-76E;71,25 €;0;27-03-2019 14:46;supplier2
SUV500;32,60 €;1;27-03-2019 14:46;supplier3

我正在尝试将以下内容作为输出的csv

SKU;price;availability;Title;Supplier
SUV500;21,50 €;1;27-03-2019 14:46;supplier1
MZ-76E;5,50 €;1;27-03-2019 14:46;supplier1

当每个SKU我想只有在价格的最低记录

因为我完全迷失了熊猫怎么办？ 与古典，如果？ 有清单吗？

有任何想法吗？

Answer 1

在熊猫中，您可以执行以下操作

import pandas as pd

df= pd.read_csv('your file')

正如安迪（Andy）在下面指出的，这仅返回价格和SKU列

df_reduced= df.groupby('SKU')['price'].min()

对于所有列，您可以将groupby更改为要保留的所有列的列表

df_reduced= df.groupby(['SKU', 'availability', 'Title', 'Supplier'])['price'].min()

Answer 2

编辑： 排除了先前令人困惑的假设

从csv文件读取后

In [8]: df = pd.read_csv(filename, delimiter=';', encoding='utf-8')

In [9]: df
Out[9]:
          SKU    price  availability             Title   Supplier
0      SUV500  21,50 €             1  27-03-2019 14:46  supplier1
1      MZ-76E   5,50 €             1  27-03-2019 14:46  supplier1
2      SUV500  49,95 €             0  27-03-2019 14:46  supplier2
3      MZ-76E  71,25 €             0  27-03-2019 14:46  supplier2
4      SUV500  32,60 €             1  27-03-2019 14:46  supplier3

添加新列以保存price的浮动值

In [12]:  df['f_price'] = df['price'].str.extract(r'([+-]?\d+\,\d+)', expand=False).str.replace(',', '.').astype(float)
#Note: if your locality using denotion `,` for decimal point, you don't need additional `str.replace`. Just use below
#df['f_price'] = df['price'].str.extract(r'([+-]?\d+\,\d+)', expand=True).astype(float)

In [13]: df
Out[13]:
          SKU    price  availability             Title   Supplier  f_price
0      SUV500  21,50 €             1  27-03-2019 14:46  supplier1    21.50
1      MZ-76E   5,50 €             1  27-03-2019 14:46  supplier1     5.50
2      SUV500  49,95 €             0  27-03-2019 14:46  supplier2    49.95
3      MZ-76E  71,25 €             0  27-03-2019 14:46  supplier2    71.25
4      SUV500  32,60 €             1  27-03-2019 14:46  supplier3    32.60

从groupby获取每个组的最小（f_price）列表

In [28]: idxmin_list = df.groupby('SKU', as_index=False)['f_price'].idxmin().tolist()

In [29]: idxmin_list
Out[29]: [1, 0]

最后，将idxmin_list传递到df并删除f_price列以获取最终结果

In [33]: df_final = df.loc[idxmin_list].drop('f_price', 1)

In [34]: df_final
Out[34]:
      SKU    price  availability             Title   Supplier
1  MZ-76E   5,50 €             1  27-03-2019 14:46  supplier1
0  SUV500  21,50 €             1  27-03-2019 14:46  supplier1

写入csv文件

In [65]: df_final.to_csv('Sku_min.csv', sep=';', index=False)

在您的工作文件夹中创建了文件Sku_min.csv ，其内容为

SKU;price;availability;Title;Supplier
MZ-76E;5,50 €;1;27-03-2019 14:46;supplier1
SUV500;21,50 €;1;27-03-2019 14:46;supplier1

Answer 3

这里没有使用熊猫的真正需要。 这可能不是最佳解决方案，但可能是我的：

import csv

class Product:
    def __init__(self, sku, price, availability, title, supplier):
        self.sku = sku
        self.price = float(price.replace(',', '.')[:-2]) # allows sorting 
        self.availability = availability
        self.title = title
        self.supplier = supplier

unparsed_products = []

with open('name_of_csv.csv', 'r') as csvfile:
    csv_reader = csv.reader(csvfile, delimiter=';')
    next(csv_reader) # to skip past header line when parsing.
    for row in csv_reader:
        p = Product(*row)
        unparsed_products.append(p)

suv500_products = [i for i in unparsed_products if i.sku == 'SUV500']
lowest_priced_suv500_product = sorted(suv500_products, key=lambda x: x.price, reverse=True)[0] # gets the first entry from the sorted list of suv500_products
print(lowest_priced_suv500_product.price)
>>> 21.50

if i.sku == X ， if i.sku == X通过更改X in的值轻松地将其扩展到其他产品。

Answer 4

非熊猫解决方案，可以获取所需的输出。

编辑：将csv编写器添加到解决方案

编辑：仅接受row[2]带有“ 1”的记录

from collections import defaultdict
import re
from operator import itemgetter
import csv

fin = open('SKU_csv.csv', 'r', encoding="utf8")
csv_reader = csv.reader(fin, delimiter=';')

fout = open('test_out.csv', 'w', newline = '')
csv_writer = csv.writer(fout, delimiter=';')

csv_writer.writerow(next(csv_reader)) # print header

d = defaultdict(list)

for row in csv_reader:
    if int(row[2]) != 1:
        continue
    key = row[0]
    val = row[1].replace(',', '.')
    price = float(re.search('\d+\.\d+', val).group(0))
    d[key].append([row, price])

fin.close()

for arr in d.values():
    minimum, _ = min(arr, key=itemgetter(1)) # minimum price (at arr idx 1)
    csv_writer.writerow(minimum)

fout.close()


'''
*** test_out.csv contents

SKU;price;availability;Title;Supplier
SUV500;21,50 €;1;27-03-2019 14:46;supplier1
MZ-76E;5,50 €;1;27-03-2019 14:46;supplier1
'''

如何使用Python查找基于同一列表的唯一值的列表元素的最小值

问题描述

4 个解决方案

解决方案1
1 2019-03-27 16:57:29

解决方案2
1 2019-03-27 23:41:15

解决方案3
0 2019-03-27 17:07:28

解决方案4
0 已采纳 2019-03-27 22:35:14

如何使用Python查找基于同一列表的唯一值的列表元素的最小值

问题描述

4 个解决方案

解决方案1 1 2019-03-27 16:57:29

解决方案2 1 2019-03-27 23:41:15

解决方案3 0 2019-03-27 17:07:28

解决方案4 0 已采纳 2019-03-27 22:35:14

解决方案1
1 2019-03-27 16:57:29

解决方案2
1 2019-03-27 23:41:15

解决方案3
0 2019-03-27 17:07:28

解决方案4
0 已采纳 2019-03-27 22:35:14