简体   繁体   中英

data frame columns formating

I have multiple data sets which is concatenated in one master file using code below.

all_data = pd.DataFrame()
dfs = (pd.read_excel(f, index_col=0, skiprows=8, skipfooter=1, usecols=[0,1])
           for f in filenames)
all_data = pd.concat(dfs, axis=1)

The data looks in "each file" looks like these:(attached) ID VALUE

Concatenated master file now has like this:

ID value  value value value value value value value 

However, we would like to rename the each value in master file to have the file name such as :


Basically replacing the default col in data frame(value) to file names as individual col instead. Please guide.

You can change column names by simply setting the df.columns attribute to some other list. In this case, it looks like your list would be ['ID',file1,file2,...]

Since those filenames are included in the path in your filenames list, we can pull out just the filename and make a new list with those.

columns = list()
for path in filenames: #loop through your filenames list
    file = path.split('\\')[-1] #this splits the path by the '\' character, and returns the last element, so the filename. You need the double-slash since the slash is the escape character from the string, or whatever they call it.
    file = file[:-4] # the file has the .xls on the end. this removes the last 4 characters
    columns.append(file)
all_data.columns = columns #all_data is our dataframe, and all_data.columns is the attribute of the dataframe that contains the column names. Changing this object to our columns list that we made will change the column names in the dataframe.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM