All files have a name convention such as NPS_Platform_FirstLabel_Session_Language_Version.xlsx I want to have additional columns like Platform, FirstLabel, Session, Language, Version these will column names and the values determined by filenames. I coded the following, it works but the value of added columns just came from the last file. For example, assume that the last filename is NPS_MEM_GAIT_Science_EN_10.xlsx. Therefore, all of the added columns values are MEM, GAIT_Science, etc. Not the corresponding file names.
import glob
import os
import pandas as pd
path = "C:/Users/User/blabla"
all_files = glob.glob(os.path.join(path, "*.xlsx")) #make list of paths
df = pd.DataFrame()
for f in all_files:
data = pd.read_excel(f)
df = df.append(data)
file_name = os.path.splitext(os.path.basename(f))[0]
nameList = []
nameList = file_name.rsplit('_')
df['Platform'] = nameList[1]
df['First label']= nameList[2]
df['Session'] = nameList[3]
df['Language'] = nameList[4]
df['Version'] = nameList[5]
df
I started with nameList[1] since I don't want NPS. Any suggestions or feedback?
I have found a solution, I leave it here since there are more views than I expected.
import glob
import os
import pandas as pd
path = "C:/Users/User/....."
all_files = glob.glob(os.path.join(path, "*.xlsx")) #make list of paths
df_files= [pd.read_excel(filename) for filename in all_files]
for dataframe, filename in zip(df_files, all_files):
filename =os.path.splitext(os.path.basename(filename))[0]
filename = filename.rsplit('_')
dataframe['Platform'] = filename[1]
dataframe['First label']= filename[2]
dataframe['Session'] = filename[3]
dataframe['Language'] = filename[4]
dataframe['Version'] = filename[5]
df= pd.concat(files_df, ignore_index=True)
I think the reason is I was just iterating over the files, not the dataframe that I was trying to build. With this, I can iterate over the dataframe and file names at the same time. I have found this solution on https://jonathansoma.com/lede/foundations-2017/classes/working-with-many-files/class/ But still if you can give explicit answer about why the first code does not work as I want, it would be great
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.