pandas dataframe 重置索引

Question

I have a dataframe like this:我有一个这样的 dataframe：

Attended出席	Email Email	JoinDate加入日期	JoinTime加入时间	JoinTime加入时间
			JoinTimeFirst加入时间优先	JoinTimeLast上次加入时间
Yes是的	009indrajeet 009注射器	12/3/2022 12/3/2022	12/3/2022 19:50 12/3/2022 19:50	12/3/2022 21:47 12/3/2022 21:47
Yes是的	09871143420.ms 09871143420.ms	12/18/2022 12/18/2022	12/18/2022 20:41 12/18/2022 20:41	12/18/2022 20:41 12/18/2022 20:41
Yes是的	09s.bisht 09s.bisht	12/17/2022 12/17/2022	12/17/2022 19:51 12/17/2022 19:51	12/17/2022 19:51 12/17/2022 19:51

and I need to change column headers like this:我需要像这样更改列标题：

Attended出席	Email Email	JoinDate加入日期	JoinTimeFirst加入时间优先	JoinTimeLast上次加入时间
Yes是的	009indrajeet 009注射器	12/3/2022 12/3/2022	12/3/2022 19:50 12/3/2022 19:50	12/3/2022 21:47 12/3/2022 21:47
Yes是的	09871143420.ms 09871143420.ms	12/18/2022 12/18/2022	12/18/2022 20:41 12/18/2022 20:41	12/18/2022 20:41 12/18/2022 20:41
Yes是的	09s.bisht 09s.bisht	12/17/2022 12/17/2022	12/17/2022 19:51 12/17/2022 19:51	12/17/2022 19:51 12/17/2022 19:51

I tried multiple ways but noting worked out, any help will be appreciated.我尝试了多种方法但没有成功，我们将不胜感激。 To get to the first dataframe, this is what I did:要到达第一个 dataframe，这就是我所做的：

import pandas as pd
df = pd.DataFrame({"Attended":["Yes","Yes","Yes"]
                    ,"Email":["009indrajeet","09871143420.ms","09s.bisht"]
                    ,"JoinTime":["Dec 3, 2022 19:50:52","Dec 3, 2022 20:10:52","Dec 3, 2022 21:47:32"]})
#convert JoinTime to timestamp column
df['JoinTime'] = pd.to_datetime(df['JoinTime'],format='%b %d, %Y %H:%M:%S', errors='raise')
#extract date from timestamp column
df['JoinDate'] = df['JoinTime'].dt.date
#created grouper dataset
df_grp = df.groupby(["Attended","Email","JoinDate"])
#define aggregations
dict_agg = {'JoinTime':[('JoinTimeFirst','min'),('JoinTimeLast','max'),('JoinTimes',set)]}
#do grouping with aggregations
df = df_grp.agg(dict_agg).reset_index()

print(df)打印（df）

print(df.columns)

MultiIndex([('Attended',              ''),
            (   'Email',              ''),
            ('JoinDate',              ''),
            ('JoinTime', 'JoinTimeFirst'),
            ('JoinTime',  'JoinTimeLast'),
            ('JoinTime',     'JoinTimes')],
           )

Answer 1

Use named aggregations - pass dictionary with changed format - keys are new columns names, values are tuples - first value is processing column and second is aggregation function:使用命名聚合 - 传递格式已更改的字典 - 键是新列名称，值是元组 - 第一个值是处理列，第二个是聚合 function：

dict_agg = {'JoinTimeFirst':('JoinTime','min'),
            'JoinTimeLast':('JoinTime','min'),
            'JoinTimes':('JoinTime',set)}
#do grouping with aggregations
df = df_grp.agg(**dict_agg).reset_index() 
print (df)
  Attended           Email    JoinDate       JoinTimeFirst  \
0      Yes    009indrajeet  2022-12-03 2022-12-03 19:50:52   
1      Yes  09871143420.ms  2022-12-03 2022-12-03 20:10:52   
2      Yes       09s.bisht  2022-12-03 2022-12-03 21:47:32   

         JoinTimeLast              JoinTimes  
0 2022-12-03 19:50:52  {2022-12-03 19:50:52}  
1 2022-12-03 20:10:52  {2022-12-03 20:10:52}  
2 2022-12-03 21:47:32  {2022-12-03 21:47:32}

You can also pass named aggregation:您还可以传递命名聚合：

#do grouping with aggregations
df = df_grp.agg(JoinTimeFirst=('JoinTime','min'),
                JoinTimeLast=('JoinTime','min'),
                JoinTimes=('JoinTime',set)).reset_index() 
print (df)
  Attended           Email    JoinDate       JoinTimeFirst  \
0      Yes    009indrajeet  2022-12-03 2022-12-03 19:50:52   
1      Yes  09871143420.ms  2022-12-03 2022-12-03 20:10:52   
2      Yes       09s.bisht  2022-12-03 2022-12-03 21:47:32   

         JoinTimeLast              JoinTimes  
0 2022-12-03 19:50:52  {2022-12-03 19:50:52}  
1 2022-12-03 20:10:52  {2022-12-03 20:10:52}  
2 2022-12-03 21:47:32  {2022-12-03 21:47:32}

Answer 2

new_df=df.dropna(axis=1).rename(columns = {df.columns[3]:'JoinTimeFirst',df.columns[4]:'JoinTimeLast'})

Answer 3

you can use rename like this:您可以像这样使用重命名：

df = df.rename(columns={'JoinTime': 'JoinTimeFirst', 'JoinTime.1': 'JoinTimeLast'}, inplace=True)

the complete code that you can use to rename the 'JoinTime' columns, rearrange the order of the columns, and save the modified DataFrame to a new CSV file:可用于重命名“JoinTime”列、重新排列列的顺序并将修改后的 DataFrame 保存到新的 CSV 文件的完整代码：

import pandas as pd

# Read in the data, skipping the first row
df = pd.read_csv("data.csv", skiprows=1)

# Rename the 'JoinTime' columns and select the columns in the desired order
df = df[['Attended', 'Email', 'JoinDate', 'JoinTimeFirst', 'JoinTimeLast']]
df = df.rename(columns={'JoinTime': 'JoinTimeFirst', 'JoinTime.1': 'JoinTimeLast'}, inplace=True)

# Save the modified DataFrame to a new CSV file
df.to_csv("modified_data.csv", index=False)

# Print the modified DataFrame
print(df)

Answer 4

Below approach is more generalized and basically it is designed if your first row has column names下面的方法更通用，基本上它是在你的第一行有列名的情况下设计的

# Setting the columns based on the column 1
import pandas as pd 
import numpy as np
# df = please load the dataframe to the df and assume that the empty values are read as null 
final_col = []
for key, val in dict(df.iloc[0].fillna(0)).items():
     if val == 0 :
        final_col.append(key)
     else:
         final_col.append(val)

df.columns  = final_col
df = df.loc[1:] # removing teh first column 
df.reset_index(drop=True, inplace=True) # Resetting the index to 0

pandas dataframe 重置索引

问题描述

4 个解决方案

解决方案1
2 已采纳 2023-01-03 09:58:48

解决方案2
1 2023-01-03 09:28:22

解决方案3
1 2023-01-03 09:36:40

解决方案4
1 2023-01-03 09:46:12

pandas dataframe 重置索引

问题描述

4 个解决方案

解决方案1 2 已采纳 2023-01-03 09:58:48

解决方案2 1 2023-01-03 09:28:22

解决方案3 1 2023-01-03 09:36:40

解决方案4 1 2023-01-03 09:46:12

解决方案1
2 已采纳 2023-01-03 09:58:48

解决方案2
1 2023-01-03 09:28:22

解决方案3
1 2023-01-03 09:36:40

解决方案4
1 2023-01-03 09:46:12