简体   繁体   中英

Setting values of a list to a list of dataframes

Scenario: I have 2 lists, one is a list of strings with names, the other is a list of dataframes with varied content. I am trying to put the values from the first list into the second.

Data Example:

list1 = ['jan18', 'feb18', 'mar18', 'apr18', 'may18']

List two is a list of dataframes with the following structure:

DF1_LIST2:
row1      row2      row3    row4
           5         55      12
           3         51      11
           3         52      11
           9         59      11

DF2_LIST2:
row1      row2      row3    row4
           9         91      7
           5         1       23
           3         24      56
           9         68      21

My objective is to add the first element of list1 to all cells in the first column of the first dataframe of list2; then the second element of list2 to all cells of the first column of the second dataframe of list 2, and so on. The output would be something like:

DF1_LIST2:
row1      row2      row3    row4
jan18      5         55      12
jan18      3         51      11
jan18      3         52      11
jan18      9         59      11

DF2_LIST2:
row1      row2      row3    row4
feb18      9         91      7
feb18      5         1       23
feb18      3         24      56
feb18      9         68      21

What I got so far was trying to establish a triple for loop, the first iterates over items of list1, the second over dataframes of list2 and the third over rows of each dataframe:

import pandas as pd
import os
from os import listdir
from os.path import isfile, join
import glob

# Get File Names
mypath = "//DGMS/Desktop/uploaded"
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]

# Get dates
onlyfiles = [name.split("_")[0] for name in onlyfiles]    
df_of_names = pd.DataFrame(onlyfiles) 

# Get File Contents
all_files = glob.glob(os.path.join(mypath, "*.xls*"))
contentdataframes = [pd.read_excel(f) for f in all_files]

for dfs in contentdataframes:
dfs.insert(0,"date*","")
dfs.insert(1,"apply*","")

for date in onlyfiles:  
     for dfs in contentdataframes:  
        for row in dfs.itertuples(index=True):
            dfs.set_value(row,0,date)

This gives me an error, I believe because of the header column, which still counts as a normal row, not an index.

Question: Is there a proper way to do this?

Use assign for add new column in each DataFrame :

d = [pd.read_excel(f).assign(row1=os.path.basename(f).split('.')[0].split('_')[0])
     for f in all_files]

EDIT:

If want working with columns and .assign with multiple columns is worse readable, is possible use loop for process each DataFrame and last append to list :

contentdataframes = []
for f in all_files:
    df = pd.read_excel(f)
    df['col1'] = 10
    df['col2'] = 'string1'
    df['row1'] = os.path.basename(f).split('.')[0].split('_')[0]
    contentdataframes.append(df)

You can extract the filename from the full path via os.path.splitext . Then wrap in a list comprehension with pd.DataFrame.assign :

import os

def extract_name(x):
    return os.path.splitext(fp)[0].split('_')[0]

dfs = [pd.read_excel(fp).assign(row1=extract_name(fp)) for fp in all_files]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM