简体   繁体   English

使用多索引连接多个.csv dataframe

[英]Concatenate multiple .csv dataframe with multiindex

I am concatenating multiple dfs which look like these:我正在连接多个如下所示的dfs

                 X                  Y
                 mean   std size   mean         std  size
In_X                    
(10.424, 10.43] 10.425  NaN  1      0.003786    NaN   1
(10.43, 10.435] 10.4    NaN  0      NaN         NaN   0

When I didn't have multiindex dfs , I was using:当我没有 multiindex dfs时,我正在使用:

extension='csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
all_dfs = pd.concat([pd.read_csv(f) for f in all_filenames ])

But this introduces a row:但这引入了一行:

mean   std size   mean          std  size

Every time a new df is concatenated to all_dfs .每次将新的df连接到all_dfs How to have only the original multiindex header and avoid the introduction of the second-level header in the concatenated df?如何只有原始多索引 header 并避免在级联 df 中引入二级 header?

read_csv by defaults only take first row as header. read_csv默认只取第一行为 header。 You want to do specify two-row header with header :你想用 header 指定两行header

all_dfs = pd.concat([pd.read_csv(f, header=[0,1] for f in all_filenames ])

Convert your multi-index to regular columns like this:将您的多索引转换为常规列,如下所示:

df.columns = df.columns.map('_'.join)

And then use pd.concat然后使用pd.concat

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM