简体   繁体   English

从熊猫中的1个CSV文件读取多个表

[英]Reading in multiple tables from 1 csv file in pandas

suppose I have a csv file like this: 假设我有一个csv文件,如下所示:

Name: Jack
Place: Binghampton
Age:27
Month,Sales,Revenue
Jan,51,$1000
Feb,20,$1050
Mar,100,$10000
### Blank File Space
### Blank File Space
Name: Jill
Place: Hamptonshire
Age: 49
Month,Sales,Revenue
Apr,11,$1000
May,55,$3000
Jun,23,$4600
### Blank File Space
### Blank File Space
...

And the contents of the file are evenly spaced as shown. 文件的内容如图所示均匀分布。 I want to read each Month,Sales,Revenue portion in as its own df. 我想将每个月,销售,收入部分读为自己的df。 I know I can do this manually by doing: 我知道我可以通过以下操作手动完成此操作:

df_Jack = pd.read_csv('./sales.csv', skiprows=3, nrows=3)
df_Jill = pd.read_csv('./sales.csv', skiprows=12, nrows=3)

I'm not even super worried about the names of the df as I think I could do that on my own, I just don't really know how to iterate through the evenly spaced file to find sales records and store them as unique dfs. 我什至不担心df的名称,因为我认为自己可以做到这一点,我只是真的不知道如何遍历间隔均匀的文件来查找销售记录并将其存储为唯一的df。

Thanks for any help in advance! 感谢您的任何帮助!

How about create a list of dfs? 如何创建DFS列表?

from io import StringIO

csvfile = StringIO("""Name: Jack
Place: Binghampton
Age:27
Month,Sales,Revenue
Jan,51,$1000
Feb,20,$1050
Mar,100,$10000
### Blank File Space
### Blank File Space
Name: Jill
Place: Hamptonshire
Age: 49
Month,Sales,Revenue
Apr,11,$1000
May,55,$3000
Jun,23,$4600
### Blank File Space
### Blank File Space""")

df = pd.read_csv(csvfile, sep=',', error_bad_lines=False, names=['Month','Sales','Revenue'])

df1 = df.dropna().loc[df.Month!='Month']

listofdf = [df1[i:i+3] for i in range(0,df1.shape[0],3)]

print(listofdf[0])

Output: 输出:

  Month Sales Revenue
4   Jan    51   $1000
5   Feb    20   $1050
6   Mar   100  $10000

print(listofdf[1])

Output: 输出:

   Month Sales Revenue
13   Apr    11   $1000
14   May    55   $3000
15   Jun    23   $4600

Obviously you could do this: 显然,您可以这样做:

dfs = [pd.read_csv('./sales.csv', skiprows=i, nrows=3) for i in range(3, n, 9)]
# where n is your expected end line...

But another way is to read the csv yourself and pass the data back to pandas : 但是另一种方法是自己读取csv并将数据传递回pandas

with open('./sales.csv', 'r') as file:
    streaming = True
    while streaming:
        name = file.readline().rstrip().replace('Name: ','')
        for _ in range(2): file.readline()
        headers = file.readline().rstrip().split(',')
        data = [file.readline().rstrip().split(',') for _ in range(3)]
        dfs[name] = pd.DataFrame.from_records(data, columns=headers)
        for _ in range(2):
            streaming = file.readline()

I'll concede it's quite brutish and inelegant compared to the other answer... but it works. 与其他答案相比,我承认这是蛮残忍的,但确实有效。 And it actually gives you the DataFrame by name within a dictionary: 实际上,它通过字典中的名称为您提供了DataFrame

>>> dfs['Jack']

  Month Sales Revenue
0   Jan    51   $1000
1   Feb    20   $1050
2   Mar   100  $10000
>>> dfs['Jill']

  Month Sales Revenue
0   Apr    11   $1000
1   May    55   $3000
2   Jun    23   $4600

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM