Python Pandas使用文本文件創建數據框

Question

我正在嘗試使用Pandas從原始文本文件創建數據框。 該文件包括3個類別，在類別名稱之后有與每個類別相關的項目。 我能夠基於類別創建一個系列，但不知道如何將每個項目類型與各自的類別相關聯並從中創建數據框。 以下是我的初始代碼以及所需的數據幀輸出。 您能以正確的方式指導我嗎？

category = ['Fruits', 'Vegetables', 'Meats']

items='''Fruits
apple
orange
pear
Vegetables
broccoli
squash
carrot
Meats
chicken
beef
lamb'''

Category = pd.Series()

i = 0
for item in items.splitlines():
    if item in category:
        Category = Category.set_value(i, item)
        i += 1
df = pd.DataFrame(Category)
print(df)

所需的DataFrame輸出：

Category    Item
Fruits      apple
            orange
            pear
Vegetables  broccoli
            squash
            carrot
Meats       chicken
            beef
            lamb

Answer 1

采用：

通過isin為檢查類別創建遮罩
insert由新的列where和ffill （ fillna與方法ffill ）
通過boolean indexing刪除兩列中的相同值，最后將reset_index用於唯一的單調默認索引。

category = ['Fruits', 'Vegetables', 'Meats']

items='''Fruits
apple
orange
pear
Vegetables
broccoli
squash
carrot
Meats
chicken
beef
lamb'''

df = pd.DataFrame({'Fruit':items.splitlines()})

mask = df['Fruit'].isin(category)
df.insert(0,'Category', df['Fruit'].where(mask).ffill())
df = df[df['Category'] != df['Fruit']].reset_index(drop=True)
print (df)
     Category     Fruit
0      Fruits     apple
1      Fruits    orange
2      Fruits      pear
3  Vegetables  broccoli
4  Vegetables    squash
5  Vegetables    carrot
6       Meats   chicken
7       Meats      beef
8       Meats      lamb

如果需要，最后倒數Categories和Fruits使用groupby和size ：

熊貓的大小和數量有什么區別？

df1 = df.groupby(['Category','Fruit']).size()
print (df1)
Category    Fruit   
Fruits      apple       1
            orange      1
            pear        1
Meats       beef        1
            chicken     1
            lamb        1
Vegetables  broccoli    1
            carrot      1
            squash      1
dtype: int64

Answer 2

這是一個沒有使用熊貓的循環的解決方案。

import pandas as pd
category = ['Fruits', 'Vegetables', 'Meats']

items='''Fruits
apple
orange
pear
Vegetables
broccoli
squash
carrot
Meats
chicken
beef
lamb'''

in_df = pd.DataFrame(items.splitlines())

根據該行是否在類別中來創建組。

in_df = in_df.assign(group=in_df.isin(category).cumsum())

從每個組的第一行創建一個數據框

cat_df = in_df.groupby('group').first()

將每個組的第二行連接回第一行，以創建類別水果關系

df_out = in_df.groupby('group').apply(lambda x: x[1:]).reset_index(drop = True).merge(cat_df, left_on='group', right_index=True)

刪除分組鍵並重命名列

df_out = df_out.drop('group',axis=1).rename(columns={'0_x':'Fruit','0_y':'Category'})
print(df_out)

輸出：

      Fruit    Category
0     apple      Fruits
1    orange      Fruits
2      pear      Fruits
3  broccoli  Vegetables
4    squash  Vegetables
5    carrot  Vegetables
6   chicken       Meats
7      beef       Meats
8      lamb       Meats

Answer 3

考慮將迭代添加到列表的字典而不是序列。 然后，將dict投射到數據框。 下面的鍵用於輸出所需的結果，因為您需要一個數字來進行這種分組：

from io import StringIO
import pandas as pd

txtobj = StringIO('''Fruits
apple
orange
pear
Vegetables
broccoli
squash
carrot
Meats
chicken
beef
lamb''')

items = {'Category':[], 'Item':[]}

for line in txtobj:
    curr_line = line.replace('\n','')
    if curr_line in ['Fruits','Vegetables', 'Meats']:
        curr_category = curr_line       

    if curr_category != curr_line:      
        items['Category'].append(curr_category)
        items['Item'].append(curr_line)

df = pd.DataFrame(items).assign(key=1)
print(df)
#      Category      Item  key
# 0      Fruits     apple    1
# 1      Fruits    orange    1
# 2      Fruits      pear    1
# 3  Vegetables  broccoli    1
# 4  Vegetables    squash    1
# 5  Vegetables    carrot    1
# 6       Meats   chicken    1
# 7       Meats      beef    1
# 8       Meats      lamb    1

print(df['key'].groupby([df['Category'], df['Item']]).count())    
# Category    Item    
# Fruits      apple       1
#             orange      1
#             pear        1
# Meats       beef        1
#             chicken     1
#             lamb        1
# Vegetables  broccoli    1
#             carrot      1
#             squash      1
# Name: key, dtype: int64

Python Pandas使用文本文件創建數據框

問題描述

3 個解決方案

解決方案1
2 2017-07-15 04:03:38

解決方案2
1 2017-07-15 03:44:02

解決方案3
0 已采納 2017-07-15 02:31:50

Python Pandas使用文本文件創建數據框

問題描述

3 個解決方案

解決方案1 2 2017-07-15 04:03:38

解決方案2 1 2017-07-15 03:44:02

解決方案3 0 已采納 2017-07-15 02:31:50

解決方案1
2 2017-07-15 04:03:38

解決方案2
1 2017-07-15 03:44:02

解決方案3
0 已采納 2017-07-15 02:31:50