简体   繁体   English

如何根据不同的条件为 pandas dataframe 中的特定列赋值?

[英]How to assign value to particular column in pandas dataframe based on different conditions?

I have a dataset with around 40,000 rows each representing a record in dataset.我有一个数据集,大约有 40,000 行,每行代表数据集中的一条记录。 One of the features named 'region_code' is categorical in nature but is represented using integer.名为“region_code”的特征之一本质上是分类的,但使用 integer 表示。 It is similar to pincode/zipcode.它类似于 pincode/zipcode。 There are around 5316 unique 'region_code' values and these Region_Codes start from 1 and go upto 5690. That means, range is [1,5690].大约有 5316 个唯一的“区域代码”值,这些区域代码从 1 开始,go 到 5690。这意味着,范围是 [1,5690]。 I want to reassign those values such that first 20 region codes that is all the rows where region code lies in the range [1,20] will be assigned region code as '1', next batch of region codes ie.我想重新分配这些值,以便将区域代码位于 [1,20] 范围内的所有行的前 20 个区域代码分配为“1”,即下一批区域代码。 [21,40] will be assigned region code of '2', next batch of region codes ie [41,60] will be assigned region code of '3' and so on. [21,40] 将被分配“2”的区域代码,下一批区域代码即 [41,60] 将被分配“3”的区域代码,依此类推。 Last batch of 20 region codes ie 5681 to 5700 will have value '285' (5700//20).最后一批 20 个区域代码,即 5681 到 5700 将具有值 '285' (5700//20)。

I can do this using if-else, but then I will have to write 285 if-else conditions each representing one condition for one batch of 20 region codes, but it is not the right approach as it will be too much manual work.我可以使用 if-else 来做到这一点,但是我必须编写 285 个 if-else 条件,每个条件代表一批 20 个区域代码的一个条件,但这不是正确的方法,因为它需要太多的手动工作。 I need a short and succinct code for this.为此,我需要一个简短的代码。

To simulate the problem so that I you can write code for it, I have created a small dataframe with region codes from 1 to 50. Here, let us group it into batches of 5. So, first 5 region codes will get value '1', next 5 region codes will get value '2' and so on till last batch of region codes which will get value of '10'.为了模拟这个问题以便我可以为它编写代码,我创建了一个小型 dataframe,区域代码从 1 到 50。在这里,让我们将其分组为 5 个批次。因此,前 5 个区域代码将获得值 '1 ',接下来的 5 个区域代码将获得值 '2',依此类推,直到最后一批区域代码将获得值 '10'。

Region_Code = np.arange(1,51)
pd.DataFrame(Region_Code, columns =['Region_Code'])

Exprected output will look like the one created by code below:预期的 output 将类似于以下代码创建的:

transformed = [1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5,6,6,6,6,6,7,7,7,7,7,8,8,8,8,8,9,9,9,9,9,10,10,10,10,10]
pd.DataFrame(transformed, columns=['Region_Code_new'])

I have manually created that list to give you a glimpse of how output would look like.我手动创建了该列表,让您了解 output 的外观。

In our original question we have to do batches of 20 region codes each and therefore there will be 285 such batches.在我们最初的问题中,我们必须每批 20 个区域代码,因此将有 285 个这样的批次。 My question is how to do this using for loop or some similar logic?我的问题是如何使用 for 循环或一些类似的逻辑来做到这一点?

You can just floor divide the column with 5 (20 in your original dataset):您可以floor divide列与 5(原始数据集中的 20)分开:

>>> Region_Code = np.arange(1,51)
>>> pd.DataFrame(Region_Code, columns =['Region_Code'])
>>> df.assign(Region_code_new=(df.Region_Code.sub(1) // 5) + 1)
    Region_Code  Region_code_new
0             1                1
1             2                1
2             3                1
3             4                1
4             5                1
5             6                2
6             7                2
7             8                2
8             9                2
9            10                2
10           11                3
11           12                3
12           13                3
13           14                3
14           15                3
15           16                4
16           17                4
17           18                4
18           19                4
19           20                4
20           21                5
21           22                5
22           23                5
23           24                5
24           25                5
25           26                6
26           27                6
27           28                6
28           29                6
29           30                6
30           31                7
31           32                7
32           33                7
33           34                7
34           35                7
35           36                8
36           37                8
37           38                8
38           39                8
39           40                8
40           41                9
41           42                9
42           43                9
43           44                9
44           45                9
45           46               10
46           47               10
47           48               10
48           49               10
49           50               10

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据pandas数据框中的多列值条件排除行? - How to exclude rows based on multi column value conditions in pandas dataframe? 如何根据pandas dataframe中的多个条件反转列值? - How to reverse the column value based on multiple conditions in pandas dataframe? 如何根据 Pandas 中的条件为 dataframe 子集的列分配值? - How to assign a value to a column for a subset of dataframe based on a condition in Pandas? 当通过复杂索引和基于布尔的条件子集时,如何为熊猫数据框分配值? - How to assign value to a pandas dataframe, when subset by complex index and boolean based conditions? 根据多个条件将现有列的值分配给 Pandas 中的新列 - Assign value of existing column to new columns in pandas based on multiple conditions 如何使用pandas.DataFrame.assign()根据不同的数据框添加新列 - How to use pandas.DataFrame.assign() to add new column based on a different dataframe 根据多个不同的条件在 pandas 数据框中创建了一个新列 - created a new column in a pandas dataframe based on multiple different conditions 根据不同条件从Pandas DataFrame的列中替换某些部分 - Substr certain parts from a column in Pandas DataFrame based on different conditions 根据不同条件在Pandas dataframe中新建一列 - Create a new column in Pandas dataframe based on different conditions 根据pandas数据框中的条件为列分配值 - Assign values to columns based on conditions in a pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM