简体   繁体   English

如何从excel链接熊猫中的分层数据?

[英]How to link hierarchical data in pandas from excel?

I have this excel sheet with hierarchies like this: excel snippet我有这个 excel 表,其层次结构如下: excel 片段

Item Category Price
**Electronics** 1 
Laptop 1 1000
Kindle 1 200
Mobile 1 500
**HouseItems** 2
VacuumCleaner 2 200
Clock 2 50

How could I get the items by category?我如何按类别获取项目? For example get the electronics like laptop and their prices and in a separate list get the house items.例如,获取笔记本电脑等电子产品及其价格,并在单独的列表中获取家居用品。 In the excel sheet I have more categories, this is just a snippet.在excel表中我有更多的类别,这只是一个片段。


df = pd.read_excel('items.xlsx',
                   ['itemSheet'], engine='openpyxl')
df['items'] = pd.Series()
item_list= ['Electronics', 'HouseItems']
for item in df['itemSheet']['Item']:
    if item in cost_entry_group:
        df['items'].add(item)

print(df['items'])

How could I link the itemCategory(electronics) to the laptop, kindle and mobile and to their respective prices and do the same for the houseitems?我如何将 itemCategory(electronics) 链接到笔记本电脑、kindle 和手机以及它们各自的价格,并对家居用品做同样的事情?

Isn't the category already in your df?这个类别不是已经在你的 df 中了吗?

use df[df['Category'] == 1] to get items where the category equals 1 aka 'HouseItems'使用df[df['Category'] == 1]来获取类别等于 1 又名“HouseItems”的项目

You could also do something like:您还可以执行以下操作:

categories = {'Electronics': 0, 'HouseItems': 1}
dfs  = {}
for category_name, category_number in categories.items():
     dfs[category_name] = df[df['Category'] == category_number]

to get multiple DataFrames containing only one category.获取仅包含一个类别的多个 DataFrame。

To extract the categories from the DataFrame you could check for a 'nan' value in the price:要从 DataFrame 中提取类别,您可以检查价格中的“nan”值:

categories = {}
for index, row in df.iterrows():
    if row.isnull().values.any():
        categories[row['Item']] = row['Category']

This would be my quick and dirty solution.这将是我快速而肮脏的解决方案。 Alternatively you could go through the excel sheet with openpyxl and check for bold text:或者,您可以使用 openpyxl 浏览 Excel 表并检查粗体文本:

from openpyxl import load_workbook

wb = load_workbook(path, data_only=True)
sh = wb[wb.sheetnames[0]]
categories = {}
for i in range(sh.min_row, sh.max_row): # go through rows
    if sh['A{}'.format(i)].font.b == True: # font.b gives True if bold otherwise False
        name = sh['A{}'.format(i)].value
        number = sh['B{}'.format(i)].value
        categories[name] = number

Probably there are 'better' solutions, but it works.可能有“更好”的解决方案,但它有效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM