简体   繁体   English

Pandas 连接数据框

[英]Pandas Concatenate dataframes

This is driving me nuts!这让我发疯! I have several Dataframe that I am trying to concatenate with pandas.我有几个数据框试图与熊猫连接。 The index is the filename.索引是文件名。 When I use df.to_csv for individual data frames I can see the index column (filename) along with the column of interest.当我将 df.to_csv 用于单个数据框时,我可以看到索引列(文件名)以及感兴趣的列。 When I concatenate along the filename axis I only get the column of interest and numbers.当我沿着文件名轴连接时,我只得到感兴趣的列和数字。 No filename.没有文件名。

Here is the code I am using as is.这是我按原样使用的代码。 It works as I expect up until the "all_filename" line.它按我的预期工作,直到“all_filename”行。

for filename in os.listdir(directory):
    if filename.endswith("log.csv"):
        df = pd.read_fwf(filename, skiprows=186, nrows=1, names=["Attribute"])
        df['System_Library_Name'] = [x.split('/')[6] for x in df['Attribute']]
        df2= pd.concat([df for filename in os.listdir(directory)], keys=[filename])
        df2.to_csv(filename+"log_info.csv", index=filename)
        
        all_filenames = glob.glob(os.path.join(directory,'*log_info.csv'))
        cat_log = pd.concat([pd.read_csv(f) for f in all_filenames ])
        cat_log2= cat_log[['System_Library_Name']]
        cat_log2.to_excel("log.xlsx", index=filename)

I have tried adding keys=filename to the 3rd to last line and giving the index a name with df.index.name=我尝试将 keys=filename 添加到第三行到最后一行,并使用 df.index.name= 为索引命名

I have used similar code before and had it work fine, however this is only one column that I am using from a larger original input file if that makes a difference.我之前使用过类似的代码并且运行良好,但是如果这有所不同,这只是我从更大的原始输入文件中使用的一列。

Any advice is greatly appreciated!任何意见是极大的赞赏!

df = pd.concat(
          # this is just reading one value from each file, yes?
         [pd.read_fwf(filename, skiprows=186, nrows=1, names=["Attribute"])
            .set_index(pd.Index([filename]))
            .applymap(lambda x: x.split('/')[6])
            .rename(columns={'Attribute':'System_Library_Name'})
          for filename in glob.glob(os.path.join(directory,'*log.csv'))
         ]
     )
df.to_xlsx("log_info.xlsx")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM