I am trying to read a text file to a nested list in Python. That is, I would like to have the output as:
[[$5.79, Breyers Ice Cream, Homemade Vanilla, 48 oz], [$6.39, Haagen-dazs, Vanilla Bean Ice Cream, 1 pt], etc...]]
The ultimate goal is to read the information into a pandas DataFrame for some exploratory analysis.
$5.79
Breyers Ice Cream
Homemade Vanilla
48 oz
$6.39
Haagen-dazs
Vanilla Bean Ice Cream
1 pt
$6.89
So Delicious
Dairy Free Coconutmilk No Sugar Added Dipped Vanilla Bars
4 x 2.3 oz
$5.79
Popsicle Fruit Pops Mango
12 ct
with open(sample.txt) as f:
creams = f.read()
creams = f.split("\n\n")
However, this returns:
['$5.79\nBreyers Ice Cream\nHomemade Vanilla\n48 oz', '$6.39\nHaagen-dazs\nVanilla Bean Ice Cream\n1 pt',
I have also tried utilizing list comprehension methods that look cleaner than the above code, but these attempts handle the newlines, not the paragraphs or returns. For example:
[x for x in open('<file_name>.txt').read().splitlines()]
#Gives
['$5.79', 'Breyers Ice Cream', 'Homemade Vanilla', '48 oz', '', '$6.39', 'Haagen-dazs', 'Vanilla Bean Ice Cream', '1 pt', '', '
I know I would need to nest a list within the list comprehension, but I'm unsure how to perform the split.
Note: This is my first posted question, sorry for the length or lack of brevity. Seeking help because there are similar questions but not with the outcome I desire.
You are nearly there once you have the four-line groups separated. All that's left is to split the groups again by a single newline.
with open('creams.txt','r') as f:
creams = f.read()
creams = creams.split("\n\n")
creams = [lines.split('\n') for lines in creams]
print(creams)
You just have to split it again.
with open('sample.txt','r') as file:
creams = file.read()
creams = creams.split("\n\n")
creams = [lines.split('\n') for lines in creams]
print(creams)
#[['$5.79 ', 'Breyers Ice Cream ', 'Homemade Vanilla ', '48 oz'], ['$6.39 ', 'Haagen-dazs ', 'Vanilla Bean Ice Cream ', '1 pt'], ['$6.89 ', 'So Delicious ', 'Dairy Free Coconutmilk No Sugar Added Dipped Vanilla Bars ', '4 x 2.3 oz'], ['$5.79 ', 'Popsicle Fruit Pops Mango', '-', '12 ct']]
#Convert to Data
df = pd.DataFrame(creams, columns =['Amnt', 'Brand', 'Flavor', 'Qty'])
Amnt Brand \
0 $5.79 Breyers Ice Cream
1 $6.39 Haagen-dazs
2 $6.89 So Delicious
3 $5.79 Popsicle Fruit Pops Mango
Flavor Qty
0 Homemade Vanilla 48 oz
1 Vanilla Bean Ice Cream 1 pt
2 Dairy Free Coconutmilk No Sugar Added Dipped V... 4 x 2.3 oz
3 - 12 ct
Note: I have added -
in the last row for the flavor column as it was empty. If your original dataset, you must take this into consideration before performing any analysis.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.