suppose I have a csv file like this:
Name: Jack
Place: Binghampton
Age:27
Month,Sales,Revenue
Jan,51,$1000
Feb,20,$1050
Mar,100,$10000
### Blank File Space
### Blank File Space
Name: Jill
Place: Hamptonshire
Age: 49
Month,Sales,Revenue
Apr,11,$1000
May,55,$3000
Jun,23,$4600
### Blank File Space
### Blank File Space
...
And the contents of the file are evenly spaced as shown. I want to read each Month,Sales,Revenue portion in as its own df. I know I can do this manually by doing:
df_Jack = pd.read_csv('./sales.csv', skiprows=3, nrows=3)
df_Jill = pd.read_csv('./sales.csv', skiprows=12, nrows=3)
I'm not even super worried about the names of the df as I think I could do that on my own, I just don't really know how to iterate through the evenly spaced file to find sales records and store them as unique dfs.
Thanks for any help in advance!
How about create a list of dfs?
from io import StringIO
csvfile = StringIO("""Name: Jack
Place: Binghampton
Age:27
Month,Sales,Revenue
Jan,51,$1000
Feb,20,$1050
Mar,100,$10000
### Blank File Space
### Blank File Space
Name: Jill
Place: Hamptonshire
Age: 49
Month,Sales,Revenue
Apr,11,$1000
May,55,$3000
Jun,23,$4600
### Blank File Space
### Blank File Space""")
df = pd.read_csv(csvfile, sep=',', error_bad_lines=False, names=['Month','Sales','Revenue'])
df1 = df.dropna().loc[df.Month!='Month']
listofdf = [df1[i:i+3] for i in range(0,df1.shape[0],3)]
print(listofdf[0])
Output:
Month Sales Revenue
4 Jan 51 $1000
5 Feb 20 $1050
6 Mar 100 $10000
print(listofdf[1])
Output:
Month Sales Revenue
13 Apr 11 $1000
14 May 55 $3000
15 Jun 23 $4600
Obviously you could do this:
dfs = [pd.read_csv('./sales.csv', skiprows=i, nrows=3) for i in range(3, n, 9)]
# where n is your expected end line...
But another way is to read the csv yourself and pass the data back to pandas
:
with open('./sales.csv', 'r') as file:
streaming = True
while streaming:
name = file.readline().rstrip().replace('Name: ','')
for _ in range(2): file.readline()
headers = file.readline().rstrip().split(',')
data = [file.readline().rstrip().split(',') for _ in range(3)]
dfs[name] = pd.DataFrame.from_records(data, columns=headers)
for _ in range(2):
streaming = file.readline()
I'll concede it's quite brutish and inelegant compared to the other answer... but it works. And it actually gives you the DataFrame
by name within a dictionary:
>>> dfs['Jack']
Month Sales Revenue
0 Jan 51 $1000
1 Feb 20 $1050
2 Mar 100 $10000
>>> dfs['Jill']
Month Sales Revenue
0 Apr 11 $1000
1 May 55 $3000
2 Jun 23 $4600
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.