I am trying to use Pandas to create a dataframe from a raw text file. The file includes 3 Categories with items related to each category after the category name. I am able to create a series based on the Category but don't know how to associate each item type to their respective category and create a dataframe out of it. Below is my initial code along with the desired output of the dataframe. Can you please help direct me in the right way to do this?
category = ['Fruits', 'Vegetables', 'Meats']
items='''Fruits
apple
orange
pear
Vegetables
broccoli
squash
carrot
Meats
chicken
beef
lamb'''
Category = pd.Series()
i = 0
for item in items.splitlines():
if item in category:
Category = Category.set_value(i, item)
i += 1
df = pd.DataFrame(Category)
print(df)
Desired DataFrame Output:
Category Item
Fruits apple
orange
pear
Vegetables broccoli
squash
carrot
Meats chicken
beef
lamb
Use:
isin
for check categories insert
new column by where
and ffill
( fillna
with method ffill
) boolean indexing
and last use reset_index
for unique monotonic default index. category = ['Fruits', 'Vegetables', 'Meats']
items='''Fruits
apple
orange
pear
Vegetables
broccoli
squash
carrot
Meats
chicken
beef
lamb'''
df = pd.DataFrame({'Fruit':items.splitlines()})
mask = df['Fruit'].isin(category)
df.insert(0,'Category', df['Fruit'].where(mask).ffill())
df = df[df['Category'] != df['Fruit']].reset_index(drop=True)
print (df)
Category Fruit
0 Fruits apple
1 Fruits orange
2 Fruits pear
3 Vegetables broccoli
4 Vegetables squash
5 Vegetables carrot
6 Meats chicken
7 Meats beef
8 Meats lamb
Last if necessary count Categories
and Fruits
use groupby
and size
:
What is the difference between size and count in pandas?
df1 = df.groupby(['Category','Fruit']).size()
print (df1)
Category Fruit
Fruits apple 1
orange 1
pear 1
Meats beef 1
chicken 1
lamb 1
Vegetables broccoli 1
carrot 1
squash 1
dtype: int64
Here's a solution without for loops using pandas.
import pandas as pd
category = ['Fruits', 'Vegetables', 'Meats']
items='''Fruits
apple
orange
pear
Vegetables
broccoli
squash
carrot
Meats
chicken
beef
lamb'''
in_df = pd.DataFrame(items.splitlines())
Create groups base on whether that row is in category or not.
in_df = in_df.assign(group=in_df.isin(category).cumsum())
Create a dataframe from the first row in each group
cat_df = in_df.groupby('group').first()
Join the second row of each group back to the first row, creating the cateogry fruit relationship
df_out = in_df.groupby('group').apply(lambda x: x[1:]).reset_index(drop = True).merge(cat_df, left_on='group', right_index=True)
Drop grouping key and rename columns
df_out = df_out.drop('group',axis=1).rename(columns={'0_x':'Fruit','0_y':'Category'})
print(df_out)
Output:
Fruit Category
0 apple Fruits
1 orange Fruits
2 pear Fruits
3 broccoli Vegetables
4 squash Vegetables
5 carrot Vegetables
6 chicken Meats
7 beef Meats
8 lamb Meats
Consider appending iteratively to a dictionary of lists instead of series. Then, cast dict to dataframe. Below key is used to output desired result as you need a numeric for such a grouping:
from io import StringIO
import pandas as pd
txtobj = StringIO('''Fruits
apple
orange
pear
Vegetables
broccoli
squash
carrot
Meats
chicken
beef
lamb''')
items = {'Category':[], 'Item':[]}
for line in txtobj:
curr_line = line.replace('\n','')
if curr_line in ['Fruits','Vegetables', 'Meats']:
curr_category = curr_line
if curr_category != curr_line:
items['Category'].append(curr_category)
items['Item'].append(curr_line)
df = pd.DataFrame(items).assign(key=1)
print(df)
# Category Item key
# 0 Fruits apple 1
# 1 Fruits orange 1
# 2 Fruits pear 1
# 3 Vegetables broccoli 1
# 4 Vegetables squash 1
# 5 Vegetables carrot 1
# 6 Meats chicken 1
# 7 Meats beef 1
# 8 Meats lamb 1
print(df['key'].groupby([df['Category'], df['Item']]).count())
# Category Item
# Fruits apple 1
# orange 1
# pear 1
# Meats beef 1
# chicken 1
# lamb 1
# Vegetables broccoli 1
# carrot 1
# squash 1
# Name: key, dtype: int64
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.