[英]Pandas drop and update rows and columns based on column value
这是板球得分的示例 csv 文件:
>>> df
venue ball run extra wide noball
0 a 0.1 0 1 NaN NaN
1 a 0.2 4 0 NaN NaN
2 a 0.3 1 5 5.0 NaN
3 a 0.4 1 0 NaN NaN
4 a 0.5 1 1 NaN 1.0
5 a 0.6 2 1 NaN NaN
6 a 0.7 6 2 1.0 1.0
7 a 0.8 0 0 NaN NaN
8 a 0.9 1 1 NaN NaN
9 a 1.1 2 2 NaN NaN
10 a 1.2 1 0 NaN NaN
11 a 1.3 6 1 NaN NaN
12 a 1.4 0 2 NaN 2.0
13 a 1.5 1 0 NaN NaN
14 a 1.6 2 0 NaN NaN
15 a 1.7 0 1 NaN NaN
16 a 0.1 0 5 NaN NaN
17 a 0.2 4 0 NaN NaN
18 a 0.3 1 1 NaN NaN
19 a 0.4 3 0 NaN NaN
20 a 0.5 0 0 NaN NaN
21 a 0.6 0 2 2.0 NaN
22 a 0.7 6 1 NaN NaN
23 a 1.1 4 0 NaN NaN
从这个数据框中,我想更新球值,生成 2 个新列并删除 4 个整列。 条件是
我想要的输出:
venue ball crun total
0 a 0.1 1 45
1 a 0.2 5 45
2 a 0.3 12 45
3 a 0.4 17 45
4 a 0.5 25 45
5 a 0.6 27 45
6 a 1.1 31 45
7 a 1.2 32 45
8 a 1.3 39 45
9 a 1.4 42 45
10 a 1.5 44 45
11 a 1.6 45 45
12 a 0.1 5 27
13 a 0.2 9 27
14 a 0.3 11 27
15 a 0.4 14 27
16 a 0.5 14 27
17 a 0.6 23 27
18 a 1.1 27 27
我觉得太复杂了,请帮忙。 我试过的代码:
df = pd.read_csv("data.csv")
gr = df.groupby(df.ball.eq(0.1).cumsum())
df["crun"] = gr.runs.cumsum()
df["total"] = gr.current_run.transform("max")
df = df.drop(['run', 'extra', 'wide', 'noball'], axis=1)
好吧。 这是一个有趣的。
(为了清楚起见,我尝试添加注释。)
注意:“ball”、“run”、“extra”、“wide”和“noball”都是数字字段。
注意注意:这一切都假设您的初始 DataFrame 位于名为df
的变量下。
# Create target groupings by ball value.
df["target_groups"] = df.loc[df["ball"] == 0.1].groupby(level=-1).ngroup()
df["target_groups"].fillna(method="ffill", inplace=True)
# --- Create subgroups --- #
df["target_subgroups"] = df["ball"].astype(int)
# Add field fro sum of run and extra
df["run_extra"] = df[["run", "extra"]].sum(axis=1)
# Apply groupby() and cumsum() as follows to get the cumulative sum
# of each ball group for run and extra.
df["crun"] = df.groupby(["target_groups"])["run_extra"].cumsum()
# Create dataframe for max crun value of each group
group_max_df = df.groupby(["target_groups"])["crun"].max().to_frame().reset_index()
# Merge both of the DataFrames with the given suffixes. The first one
# Just prevents crun from having a suffix added, which is an additional
# step to remove.
# You could probably use .join() in a similar manner.
df = pd.merge(df, group_max_df,
on=["target_groups"],
suffixes=("", "_total"),
sort=False
)
# Rename your new total field.
df.rename(columns={"crun_total": "total"}, inplace = True)
# Apply your wide and noball condition here.
df = df[(df["wide"].isna()) & (df["noball"].isna())].copy()
# -- Reset `ball` column -- #
# Add temp column with static value
df["tmp_ball"] = 0.1
# Generate cumulative sum by subgroup.
# Set `ball` to modulo 0.6
df.loc[:, "ball"] = df.groupby(["target_subgroups"])["tmp_ball"].cumsum() % 0.6
# Find rows where ball == 0.0 and set those to 0.6
df.loc[df["ball"] == 0.0, "ball"] = 0.6
# Add ball and target_subgroups columns to get final ball value.
df["ball"] = df["ball"] + df["target_subgroups"]
# Reset your main index, if desired
df.reset_index(drop=True, inplace=True)
# Select only desired field for output.
df = df.loc[:, ["venue","ball","crun","total"]].copy()
df
输出:
venue ball crun total
0 a 0.1 1 45
1 a 0.2 5 45
2 a 0.4 12 45
3 a 0.6 17 45
4 a 0.8 25 45
5 a 0.9 27 45
6 a 1.1 31 45
7 a 1.2 32 45
8 a 1.3 39 45
9 a 1.5 42 45
10 a 1.6 44 45
11 a 1.7 45 45
12 a 0.1 5 27
13 a 0.2 9 27
14 a 0.3 11 27
15 a 0.4 14 27
16 a 0.5 14 27
17 a 0.7 23 27
18 a 1.1 27 27
编辑:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.