简体   繁体   English

熊猫:生成一个数据框列,其值取决于数据框的另一列

[英]Pandas: Generate a Dataframe column which has values depending on another column of a dataframe

I am trying to generate a pandas Dataframe where a column will have numerical values based on the values of a column in another dataframe. 我正在尝试生成一个熊猫数据帧,其中一列将具有基于另一个数据帧中一列的值的数值。 Below is an example: I want to generate another dataframe based on a column of dataframe df_ 下面是一个示例:我想基于数据框df_的列生成另一个数据框

ipdb> df_ = pd.DataFrame({'c1':[False, True, False, True]})
ipdb> df_
      c1
0  False
1   True
2  False
3   True

Using df_ another dataframe df1 is generated with columns as below. 使用df_,将生成具有以下列的另一个数据帧df1。

ipdb> df1
   col1  col2
0     0   NaN
1     1   0
2     2   NaN
3     3   1

Here, 'col1' has normal index values and 'c1' has NaN in the rows where there was False in df_ and sequentially incrementing values where 'c1' is True. 在这里,'col1'具有正常的索引值,而'c1'在df_中存在False的行中具有NaN,并在'c1'为True时按顺序递增值。

To generate this dataframe, below is what I have tried. 为了生成此数据框,以下是我尝试过的操作。

ipdb> df_[df_['c1']==True].reset_index().reset_index()
   level_0  index    c1
0        0      1  True
1        1      3  True

However, I feel there should be a better way to generate a dataframe with the two columns as in df1. 但是,我认为应该像df1一样,有一种更好的方法来生成包含两列的数据框。

I think you need cumsum and subtract 1 for start counting from 0 : 我认为您需要cumsum并从0开始减去1

df_ = pd.DataFrame({'c1':[False, True, False, True]})

df_['col2'] = df_.loc[df_['c1'], 'c1'].cumsum().sub(1)
print (df_)
      c1  col2
0  False   NaN
1   True   0.0
2  False   NaN
3   True   1.0

Another solution is count occurencies of True values by sum with numpy.arange and assign back to filtered DataFrame : 另一个解决方案是用numpy.arangesum计算True值的出现numpy.arange然后分配回已过滤的DataFrame

df_.loc[df_['c1'],'col2']= np.arange(df_['c1'].sum())
print (df_)
      c1  col2
0  False   NaN
1   True   0.0
2  False   NaN
3   True   1.0

Details : 详细资料

print (df_['c1'].sum())
2

print (np.arange(df_['c1'].sum()))
[0 1]

another way to solve this, 解决这个问题的另一种方法,

df.loc[df['c1'],'col2']=range(len(df[df['c1']]))

Output: 输出:

      c1  col2
0  False   NaN
1   True   0.0
2  False   NaN
3   True   1.0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python Pandas - Dataframe - 根据另一列添加列,该列具有来自另外两列的数学运算 - Python Pandas - Dataframe - Add column depending on another column, which has a mathematical operation from another two columns Pandas Dataframe:从另一列中唯一值最多的列中查找唯一值 - Pandas Dataframe: Find unique value from one column which has the largest number of unique values in another column 取决于 Pandas Dataframe 列的值数组 - Array of values depending on column of Pandas Dataframe 查询具有值为列表的pandas数据帧列 - Querying a pandas dataframe column which has values as list dataframe 的列中存在的子集字符串,具体取决于另一列的值 - Pandas - Subsetting strings present in a column of a dataframe, depending on value of another column - Pandas 在熊猫数据框的一列中选择数据,具体取决于另一列中的字符串 - select data in a column of a pandas dataframe depending on pieces of string in another column 如何根据来自另一列的滚动函数的结果计算pandas DataFrame列的值 - How to calculate the values of a pandas DataFrame column depending on the results of a rolling function from another column Python - 如何根据另一列中的值更改 pandas dataframe 的一列中的值组? - Python - How to change groups of values in one column of pandas dataframe depending on a value in another column? 如何根据来自 dfB 的列/行值,使用来自另一个数据框 (dfB) 的值填充 Pandas 数据框 (dfA) 列“A”? - How to populate a pandas dataframe (dfA) column "A" with values from another dataframe (dfB), depending on column / row values from dfB? 将Pandas DataFrame列值与另一个DataFrame列匹配 - Matching Pandas DataFrame Column Values with another DataFrame Column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM