简体   繁体   English

如何在其他两列上创建熊猫数据框列循环?

[英]How to create pandas dataframe column loop on two other columns?

I have a Pandas dataframe and am at a bit of a loss with how to do what I am hoping to.我有一个 Pandas 数据框,但对如何做我希望做的事情有点茫然。 This is a snippet of the dataframe, and I am uploading a screenshot as well.这是数据框的片段,我也在上传屏幕截图 Effectively, I would like to create a new column defined as pitches where count is '3--2'.实际上,我想创建一个新列,定义为计数为“3--2”的间距。

To do this, I would like to, loop through all rows.为此,我想循环遍历所有行。 For a given row (which I'll refer to as the original row), if prev_count == '3--2' I then want to对于给定的行(我将其称为原始行),如果prev_count == '3--2'然后我想

  1. step down dataframe rows to where prev_count != '3--2'将数据帧行降低到prev_count != '3--2'
  2. confirm that row has the same batter-pitcher identifier as the original row确认该行与原始行具有相同的batter-pitcher标识符
  3. once in a row that satisfies the conditions prev_count != '3--2' AND batter-pitcher (original row) == batter-pitcher (new row), I would like to extract pitch_number of the new row连续一次满足条件prev_count != '3--2' batter-pitcher (原始行)== batter-pitcher (新行),我想提取新行的pitch_number
  4. then would calculate a value for the new column in the original row using the formula:然后将使用以下公式计算原始行中新列的值:

pitch_number (original row) + 1 - pitch_number (new row) pitch_number (原始行)+ 1 - pitch_number (新行)

To use the existing dataframe as an example... indices 62, 4, 186, 87, and 252 would have a value of 1 for the new column.以现有数据框为例...索引 62、4、186、87 和 252 的新列的值为 1。 Index 171 would have a value of 3;索引 171 的值为 3; 177 a value of 2; 177 值为 2; and 192 a value of 1. Likewise, 191 would have a value of 5;和 192 的值为 1。同样,191 的值为 5; 229, 10, and 57 would also have values of 1 for this new column variable.对于这个新的列变量,229、10 和 57 的值也将为 1。

            player_name   batter-pitcher  pitch_number count prev_count
62   Graveman, Kendall  501303---608665             6  3--2       3--1
4          Smyly, Drew  608665---592767             6  3--2       2--2
186  Graveman, Kendall  592696---608665             8  3--2       2--2
87         Maton, Phil  621020---664208             6  3--2       3--1
252      Martin, Chris  514888---455119             6  3--2       2--2
171      Urquidy, José  624585---664353             8  3--2       3--2
177      Urquidy, José  624585---664353             7  3--2       3--2
192      Urquidy, José  624585---664353             6  3--2       3--1
191       García, Yimi  594807---554340            12  3--2       3--2
198       García, Yimi  594807---554340            11  3--2       3--2
209       García, Yimi  594807---554340            10  3--2       3--2
219       García, Yimi  594807---554340             9  3--2       3--2
229       García, Yimi  594807---554340             8  3--2       2--2
10     Valdez, Framber  592696---664285             6  3--2       2--2
57     Valdez, Framber  518692---664285             6  3--2       2--2

I am a bit at a loss as how to 1) loop through rows on a dataframe, and then 2) within each block of the loop, step down rows and 3) reference other columns in the dataframe within another row, so would really appreciate some guidance here.我有点不知所措,因为如何 1) 遍历数据帧上的行,然后 2) 在循环的每个块中,逐步减少行和 3) 在另一行中引用数据帧中的其他列,所以真的很感激这里有一些指导。 Thanks so much!非常感谢!

For your given dataset, I think this works.对于您给定的数据集,我认为这是有效的。 But it assumes your pitch counts are always incremented by one and you're not missing any data, otherwise this wouldn't work.但它假设您的音高计数总是加一并且您没有丢失任何数据,否则这将不起作用。 I'd suggest looking into cumcount(), cummax(), cummin() grouping on pitcher-batter.我建议在投手-击球手上查看 cumcount()、cummax()、cummin() 分组。

Column 'new1' is the final answer, column 'new' is just an intermediate step. 'new1' 列是最终答案,'new' 列只是一个中间步骤。

# get dataframe into right order
df.sort_values(by=['batter-pitcher', 'pitch_number'], ascending=[True, False], inplace=True)


df['new'] = df.groupby(['batter-pitcher', 'prev_count'])['count'].cumcount(ascending=False) + 1

df['new1'] = np.where((df['new']==1) & (df['prev_count']!='3--2'), 1, df['new']+1)

           player_name   batter-pitcher  pitch_number count prev_count  new  new1
62   Graveman, Kendall  501303---608665             6  3--2       3--1    1     1
4          Smyly, Drew  608665---592767             6  3--2       2--2    1     1
186  Graveman, Kendall  592696---608665             8  3--2       2--2    1     1
87         Maton, Phil  621020---664208             6  3--2       3--1    1     1
252      Martin, Chris  514888---455119             6  3--2       2--2    1     1
171      Urquidy, José  624585---664353             8  3--2       3--2    2     3
177      Urquidy, José  624585---664353             7  3--2       3--2    1     2
192      Urquidy, José  624585---664353             6  3--2       3--1    1     1
191       García, Yimi  594807---554340            12  3--2       3--2    4     5
198       García, Yimi  594807---554340            11  3--2       3--2    3     4
209       García, Yimi  594807---554340            10  3--2       3--2    2     3
219       García, Yimi  594807---554340             9  3--2       3--2    1     2
229       García, Yimi  594807---554340             8  3--2       2--2    1     1
10     Valdez, Framber  592696---664285             6  3--2       2--2    1     1
57     Valdez, Framber  518692---664285             6  3--2       2--2    1     1

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas DataFrame 基于其他两列创建新的 csv 列 - Pandas DataFrame create new csv column based on two other columns Pandas:创建一个将一列与其他两列相关联的数据框 - Pandas: create a dataframe relating a column to other two columns 如何从数据框中的其他列创建新的Pandas数据框列 - How to create a new Pandas dataframe column from other columns in the dataframe 如何基于保存日期的其他两个列创建一个 Pandas DataFrame 列? - How to create a pandas DataFrame column based on two other columns that holds dates? 如何基于布尔表达式和其他两个列的关系在pandas数据框中创建列 - How to create column in pandas dataframe based on boolean expression and relationship of two other columns 如何通过引用其他两列在 Python Dataframe 中创建新列? - How to create a new column in Python Dataframe by referencing two other columns? Pandas DataFrame 从其他 DataFrame 添加两列 - Pandas DataFrame add column by two columns from other DataFrame 如何在 pandas dataframe 中通过在两行之间划分特定列中的值并保持其他列不变来创建新行? - How to create a new row in pandas dataframe by dividing values in a specific column between two rows and keeping other columns intact? 使用Pandas DataFrame中其他两列的键和值创建字典列 - Create column of dictionaries with keys and values from other two columns in Pandas DataFrame Pandas Dataframe 使用 Groupby 从其他两列的唯一值创建下一个未来日期的列 - Pandas Dataframe Create Column of Next Future Date from Unique values of two other columns, with Groupby
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM