[英]pandas dataframe reset index
我有一個這樣的 dataframe:
出席 | 加入日期 | 加入時間 | 加入時間 | |
---|---|---|---|---|
加入時間優先 | 上次加入時間 | |||
是的 | 009注射器 | 12/3/2022 | 12/3/2022 19:50 | 12/3/2022 21:47 |
是的 | 09871143420.ms | 12/18/2022 | 12/18/2022 20:41 | 12/18/2022 20:41 |
是的 | 09s.bisht | 12/17/2022 | 12/17/2022 19:51 | 12/17/2022 19:51 |
我需要像這樣更改列標題:
出席 | 加入日期 | 加入時間優先 | 上次加入時間 | |
---|---|---|---|---|
是的 | 009注射器 | 12/3/2022 | 12/3/2022 19:50 | 12/3/2022 21:47 |
是的 | 09871143420.ms | 12/18/2022 | 12/18/2022 20:41 | 12/18/2022 20:41 |
是的 | 09s.bisht | 12/17/2022 | 12/17/2022 19:51 | 12/17/2022 19:51 |
我嘗試了多種方法但沒有成功,我們將不勝感激。 要到達第一個 dataframe,這就是我所做的:
import pandas as pd
df = pd.DataFrame({"Attended":["Yes","Yes","Yes"]
,"Email":["009indrajeet","09871143420.ms","09s.bisht"]
,"JoinTime":["Dec 3, 2022 19:50:52","Dec 3, 2022 20:10:52","Dec 3, 2022 21:47:32"]})
#convert JoinTime to timestamp column
df['JoinTime'] = pd.to_datetime(df['JoinTime'],format='%b %d, %Y %H:%M:%S', errors='raise')
#extract date from timestamp column
df['JoinDate'] = df['JoinTime'].dt.date
#created grouper dataset
df_grp = df.groupby(["Attended","Email","JoinDate"])
#define aggregations
dict_agg = {'JoinTime':[('JoinTimeFirst','min'),('JoinTimeLast','max'),('JoinTimes',set)]}
#do grouping with aggregations
df = df_grp.agg(dict_agg).reset_index()
打印(df)
print(df.columns)
MultiIndex([('Attended', ''),
( 'Email', ''),
('JoinDate', ''),
('JoinTime', 'JoinTimeFirst'),
('JoinTime', 'JoinTimeLast'),
('JoinTime', 'JoinTimes')],
)
使用命名聚合 - 傳遞格式已更改的字典 - 鍵是新列名稱,值是元組 - 第一個值是處理列,第二個是聚合 function:
dict_agg = {'JoinTimeFirst':('JoinTime','min'),
'JoinTimeLast':('JoinTime','min'),
'JoinTimes':('JoinTime',set)}
#do grouping with aggregations
df = df_grp.agg(**dict_agg).reset_index()
print (df)
Attended Email JoinDate JoinTimeFirst \
0 Yes 009indrajeet 2022-12-03 2022-12-03 19:50:52
1 Yes 09871143420.ms 2022-12-03 2022-12-03 20:10:52
2 Yes 09s.bisht 2022-12-03 2022-12-03 21:47:32
JoinTimeLast JoinTimes
0 2022-12-03 19:50:52 {2022-12-03 19:50:52}
1 2022-12-03 20:10:52 {2022-12-03 20:10:52}
2 2022-12-03 21:47:32 {2022-12-03 21:47:32}
您還可以傳遞命名聚合:
#do grouping with aggregations
df = df_grp.agg(JoinTimeFirst=('JoinTime','min'),
JoinTimeLast=('JoinTime','min'),
JoinTimes=('JoinTime',set)).reset_index()
print (df)
Attended Email JoinDate JoinTimeFirst \
0 Yes 009indrajeet 2022-12-03 2022-12-03 19:50:52
1 Yes 09871143420.ms 2022-12-03 2022-12-03 20:10:52
2 Yes 09s.bisht 2022-12-03 2022-12-03 21:47:32
JoinTimeLast JoinTimes
0 2022-12-03 19:50:52 {2022-12-03 19:50:52}
1 2022-12-03 20:10:52 {2022-12-03 20:10:52}
2 2022-12-03 21:47:32 {2022-12-03 21:47:32}
new_df=df.dropna(axis=1).rename(columns = {df.columns[3]:'JoinTimeFirst',df.columns[4]:'JoinTimeLast'})
您可以像這樣使用重命名:
df = df.rename(columns={'JoinTime': 'JoinTimeFirst', 'JoinTime.1': 'JoinTimeLast'}, inplace=True)
可用於重命名“JoinTime”列、重新排列列的順序並將修改后的 DataFrame 保存到新的 CSV 文件的完整代碼:
import pandas as pd
# Read in the data, skipping the first row
df = pd.read_csv("data.csv", skiprows=1)
# Rename the 'JoinTime' columns and select the columns in the desired order
df = df[['Attended', 'Email', 'JoinDate', 'JoinTimeFirst', 'JoinTimeLast']]
df = df.rename(columns={'JoinTime': 'JoinTimeFirst', 'JoinTime.1': 'JoinTimeLast'}, inplace=True)
# Save the modified DataFrame to a new CSV file
df.to_csv("modified_data.csv", index=False)
# Print the modified DataFrame
print(df)
下面的方法更通用,基本上它是在你的第一行有列名的情況下設計的
# Setting the columns based on the column 1
import pandas as pd
import numpy as np
# df = please load the dataframe to the df and assume that the empty values are read as null
final_col = []
for key, val in dict(df.iloc[0].fillna(0)).items():
if val == 0 :
final_col.append(key)
else:
final_col.append(val)
df.columns = final_col
df = df.loc[1:] # removing teh first column
df.reset_index(drop=True, inplace=True) # Resetting the index to 0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.