I have 10 files with the following identical format and column names (values are different across different files):
event_code timestamp counter
0 9071 1165783 NaN
1 9070 1165883 NaN
2 8071 1166167 NaN
3 7529 NaN 0.0
4 8529 NaN 1.0
5 9529 NaN 1.0
Due to the nature of the files, I am trying to store these data in multilevel dataframe like the following: (Eventually, I would want the box_num
level to go all the way to 10)
box_num 1 2 ...
col_names event_code timestamp counter |event_code timestamp counter
0 9071 1270451 1 | 8529 NaN 1 ...
1 9070 1270484 0 | 9529 NaN 0 ...
2 9071 1270736 1 | 5520 3599167 2 ...
3 9070 1272337 3 | 7171 3599169 1 ...
I initially thought I could make a multilevel dataframe with a dictionary using the keys as the hierarchical index and the dataframe as the subjugated dataframe
col_names = ['event_code','timestamp', 'counter']
df_dict = {}
for i in range(len(files)):
f = files[i] # actual file name
df = pd.read_csv(f, sep=":", header=None, names=col_names)
df_dict[i+1] = df # 'i+1' so that dict_key can correspond to actual box number
But I soon realized that I can't create a multilevel index or dataframe from a dictionary. So to create a Multilevel Index, this is what I did, but now I am stuck on what to do next...
(box_num, col_list) = df_dict.keys(), list(df_dict.values())[0].columns
If there are other more efficient, concise ways to approach this problem, please let me know as well. Ideally, I would like to create the multilevel dataframe right after the for loop
So I eventually figured out a way to create a multilevel dataframe from the for loop using pd.concat(). I'll post my answer below. Hopefully it's helpful to someone.
col_names = ['event_code', 'timestamp', 'counter']
result = []
box_num = []
for i in range(len(files)):
f = files[i]
box_num.append(i+1) # box_number
df = pd.read_csv(f, sep=":", header=None, names=col_names)
result.append(df)
# # pd.concat() combines all the Series in the 'result' list
# # 'Keys' option adds a hierarchical index at the outermost level of the data.
final_df = pd.concat(result, axis=1, keys=box_num, names=['Box Number','Columns'])
I think you should use a pivot table or the pandas groupby function for this task. Neither will give you exactly what you have requested above, but it will be simpler to use.
Using your code as a starting point:
col_names = ['event_code','timestamp', 'counter']
data = pd.DataFrame()
for i in range(len(files)):
f = files[i]
df = pd.read_csv(f, sep=":", header=None, names=col_names)
# instead of a dictionary try creating a master DataFrame
df['box_num'] = i
data = pd.concat([data, df]).reset_index(drop=True)
data['idx'] = data.index
# option 1 create a pivot table
pivot = data.pivot(index='idx', columns='box_num', values=col_names)
# option 2 use pandas groupby function
group = data.groupby(['idx', 'box_num']).mean()
Hopefully one of these can help you get going in the right direction or work for what you are trying to accomplish. Good luck!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.