[英]pandas: read multiple dataframes from one csv
I have a csv file that looks like this:我有一个 csv 文件,如下所示:
col A, col B
1, 5
2,7
78,65
###########
5,8
15,23
###########
17, 15
25,62
12,15
95,56
How to transform it into set of dataframes, one for each area between ######### lines (I can change the marker if needed)?如何将其转换为一组数据框,######### 行之间的每个区域一个(如果需要,我可以更改标记)?
The result should be something like this:结果应该是这样的:
df1 = {col A :{1,2,78}, col B: {5,7,65}}
df2 = {col A: {5,15}, col B: {8,23}}
df3 = {col A: {17,25,12,95}, col B: {15,62,15,56}}
I know there is a workaround using file.readlines(), but it is "not very elegant" - I wonder if there is a pandas way to do it directly.我知道有一个使用 file.readlines() 的解决方法,但它“不是很优雅”——我想知道是否有 pandas 方法可以直接做到这一点。
Highly inspired by piRSquared's answer here , you can approach your goal like this:受到 piRSquared在这里的回答的高度启发,您可以这样实现您的目标:
import pandas as pd
import numpy as np
df = pd.read_csv("/input_file.csv")
# is the row a horizontal delimiter ?
m = df["col A"].str.contains("#", na=False)
l_df = list(filter(lambda d: not d.empty, np.split(df, np.flatnonzero(m) + 1)))
_ = [exec(f"globals()['df{idx}'] = df.loc[~m]") for idx, df in enumerate(l_df, start=1)]
#if you need a dictionnary (instead of a dataframe), you can use df.loc[~m].to_dict("list")
NB: We used globals
to create the variables/sub-dataframes dynamically.注意:我们使用globals
变量动态创建变量/子数据帧。
print(df1, type(df1)), print(df2, type(df2)), print(df3, type(df3))
col A col B
0 1 5.0
1 2 7.0
2 78 65.0 <class 'pandas.core.frame.DataFrame'>
col A col B
4 5 8.0
5 15 23.0 <class 'pandas.core.frame.DataFrame'>
col A col B
7 17 15.0
8 25 62.0
9 12 15.0
10 95 56.0 <class 'pandas.core.frame.DataFrame'>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.