Python Pandas Create Dataframe using a text file

Question

I am trying to use Pandas to create a dataframe from a raw text file. The file includes 3 Categories with items related to each category after the category name. I am able to create a series based on the Category but don't know how to associate each item type to their respective category and create a dataframe out of it. Below is my initial code along with the desired output of the dataframe. Can you please help direct me in the right way to do this?

category = ['Fruits', 'Vegetables', 'Meats']

items='''Fruits
apple
orange
pear
Vegetables
broccoli
squash
carrot
Meats
chicken
beef
lamb'''

Category = pd.Series()

i = 0
for item in items.splitlines():
    if item in category:
        Category = Category.set_value(i, item)
        i += 1
df = pd.DataFrame(Category)
print(df)

Desired DataFrame Output:

Category    Item
Fruits      apple
            orange
            pear
Vegetables  broccoli
            squash
            carrot
Meats       chicken
            beef
            lamb

Answer 1

Use:

create mask by isin for check categories
insert new column by where and ffill ( fillna with method ffill )
remove same values in both columns by boolean indexing and last use reset_index for unique monotonic default index.

category = ['Fruits', 'Vegetables', 'Meats']

items='''Fruits
apple
orange
pear
Vegetables
broccoli
squash
carrot
Meats
chicken
beef
lamb'''

df = pd.DataFrame({'Fruit':items.splitlines()})

mask = df['Fruit'].isin(category)
df.insert(0,'Category', df['Fruit'].where(mask).ffill())
df = df[df['Category'] != df['Fruit']].reset_index(drop=True)
print (df)
     Category     Fruit
0      Fruits     apple
1      Fruits    orange
2      Fruits      pear
3  Vegetables  broccoli
4  Vegetables    squash
5  Vegetables    carrot
6       Meats   chicken
7       Meats      beef
8       Meats      lamb

Last if necessary count Categories and Fruits use groupby and size :

What is the difference between size and count in pandas?

df1 = df.groupby(['Category','Fruit']).size()
print (df1)
Category    Fruit   
Fruits      apple       1
            orange      1
            pear        1
Meats       beef        1
            chicken     1
            lamb        1
Vegetables  broccoli    1
            carrot      1
            squash      1
dtype: int64

Answer 2

Here's a solution without for loops using pandas.

import pandas as pd
category = ['Fruits', 'Vegetables', 'Meats']

items='''Fruits
apple
orange
pear
Vegetables
broccoli
squash
carrot
Meats
chicken
beef
lamb'''

in_df = pd.DataFrame(items.splitlines())

Create groups base on whether that row is in category or not.

in_df = in_df.assign(group=in_df.isin(category).cumsum())

Create a dataframe from the first row in each group

cat_df = in_df.groupby('group').first()

Join the second row of each group back to the first row, creating the cateogry fruit relationship

df_out = in_df.groupby('group').apply(lambda x: x[1:]).reset_index(drop = True).merge(cat_df, left_on='group', right_index=True)

Drop grouping key and rename columns

df_out = df_out.drop('group',axis=1).rename(columns={'0_x':'Fruit','0_y':'Category'})
print(df_out)

Output:

      Fruit    Category
0     apple      Fruits
1    orange      Fruits
2      pear      Fruits
3  broccoli  Vegetables
4    squash  Vegetables
5    carrot  Vegetables
6   chicken       Meats
7      beef       Meats
8      lamb       Meats

Answer 3

Consider appending iteratively to a dictionary of lists instead of series. Then, cast dict to dataframe. Below key is used to output desired result as you need a numeric for such a grouping:

from io import StringIO
import pandas as pd

txtobj = StringIO('''Fruits
apple
orange
pear
Vegetables
broccoli
squash
carrot
Meats
chicken
beef
lamb''')

items = {'Category':[], 'Item':[]}

for line in txtobj:
    curr_line = line.replace('\n','')
    if curr_line in ['Fruits','Vegetables', 'Meats']:
        curr_category = curr_line       

    if curr_category != curr_line:      
        items['Category'].append(curr_category)
        items['Item'].append(curr_line)

df = pd.DataFrame(items).assign(key=1)
print(df)
#      Category      Item  key
# 0      Fruits     apple    1
# 1      Fruits    orange    1
# 2      Fruits      pear    1
# 3  Vegetables  broccoli    1
# 4  Vegetables    squash    1
# 5  Vegetables    carrot    1
# 6       Meats   chicken    1
# 7       Meats      beef    1
# 8       Meats      lamb    1

print(df['key'].groupby([df['Category'], df['Item']]).count())    
# Category    Item    
# Fruits      apple       1
#             orange      1
#             pear        1
# Meats       beef        1
#             chicken     1
#             lamb        1
# Vegetables  broccoli    1
#             carrot      1
#             squash      1
# Name: key, dtype: int64

Python Pandas Create Dataframe using a text file

Question

3 answers

solution1
2 2017-07-15 04:03:38

solution2
1 2017-07-15 03:44:02

solution3
0 ACCPTED 2017-07-15 02:31:50

Python Pandas Create Dataframe using a text file

Question

3 answers

solution1 2 2017-07-15 04:03:38

solution2 1 2017-07-15 03:44:02

solution3 0 ACCPTED 2017-07-15 02:31:50

solution1
2 2017-07-15 04:03:38

solution2
1 2017-07-15 03:44:02

solution3
0 ACCPTED 2017-07-15 02:31:50