如何根据不同的条件为 pandas dataframe 中的特定列赋值？

Question

I have a dataset with around 40,000 rows each representing a record in dataset.我有一个数据集，大约有 40,000 行，每行代表数据集中的一条记录。 One of the features named 'region_code' is categorical in nature but is represented using integer.名为“region_code”的特征之一本质上是分类的，但使用 integer 表示。 It is similar to pincode/zipcode.它类似于 pincode/zipcode。 There are around 5316 unique 'region_code' values and these Region_Codes start from 1 and go upto 5690. That means, range is [1,5690].大约有 5316 个唯一的“区域代码”值，这些区域代码从 1 开始，go 到 5690。这意味着，范围是 [1,5690]。 I want to reassign those values such that first 20 region codes that is all the rows where region code lies in the range [1,20] will be assigned region code as '1', next batch of region codes ie.我想重新分配这些值，以便将区域代码位于 [1,20] 范围内的所有行的前 20 个区域代码分配为“1”，即下一批区域代码。 [21,40] will be assigned region code of '2', next batch of region codes ie [41,60] will be assigned region code of '3' and so on. [21,40] 将被分配“2”的区域代码，下一批区域代码即 [41,60] 将被分配“3”的区域代码，依此类推。 Last batch of 20 region codes ie 5681 to 5700 will have value '285' (5700//20).最后一批 20 个区域代码，即 5681 到 5700 将具有值 '285' (5700//20)。

I can do this using if-else, but then I will have to write 285 if-else conditions each representing one condition for one batch of 20 region codes, but it is not the right approach as it will be too much manual work.我可以使用 if-else 来做到这一点，但是我必须编写 285 个 if-else 条件，每个条件代表一批 20 个区域代码的一个条件，但这不是正确的方法，因为它需要太多的手动工作。 I need a short and succinct code for this.为此，我需要一个简短的代码。

To simulate the problem so that I you can write code for it, I have created a small dataframe with region codes from 1 to 50. Here, let us group it into batches of 5. So, first 5 region codes will get value '1', next 5 region codes will get value '2' and so on till last batch of region codes which will get value of '10'.为了模拟这个问题以便我可以为它编写代码，我创建了一个小型 dataframe，区域代码从 1 到 50。在这里，让我们将其分组为 5 个批次。因此，前 5 个区域代码将获得值 '1 '，接下来的 5 个区域代码将获得值 '2'，依此类推，直到最后一批区域代码将获得值 '10'。

Region_Code = np.arange(1,51)
pd.DataFrame(Region_Code, columns =['Region_Code'])

Exprected output will look like the one created by code below:预期的 output 将类似于以下代码创建的：

transformed = [1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5,6,6,6,6,6,7,7,7,7,7,8,8,8,8,8,9,9,9,9,9,10,10,10,10,10]
pd.DataFrame(transformed, columns=['Region_Code_new'])

I have manually created that list to give you a glimpse of how output would look like.我手动创建了该列表，让您了解 output 的外观。

In our original question we have to do batches of 20 region codes each and therefore there will be 285 such batches.在我们最初的问题中，我们必须每批 20 个区域代码，因此将有 285 个这样的批次。 My question is how to do this using for loop or some similar logic?我的问题是如何使用 for 循环或一些类似的逻辑来做到这一点？

Answer 1

You can just floor divide the column with 5 (20 in your original dataset):您可以floor divide列与 5（原始数据集中的 20）分开：

>>> Region_Code = np.arange(1,51)
>>> pd.DataFrame(Region_Code, columns =['Region_Code'])
>>> df.assign(Region_code_new=(df.Region_Code.sub(1) // 5) + 1)
    Region_Code  Region_code_new
0             1                1
1             2                1
2             3                1
3             4                1
4             5                1
5             6                2
6             7                2
7             8                2
8             9                2
9            10                2
10           11                3
11           12                3
12           13                3
13           14                3
14           15                3
15           16                4
16           17                4
17           18                4
18           19                4
19           20                4
20           21                5
21           22                5
22           23                5
23           24                5
24           25                5
25           26                6
26           27                6
27           28                6
28           29                6
29           30                6
30           31                7
31           32                7
32           33                7
33           34                7
34           35                7
35           36                8
36           37                8
37           38                8
38           39                8
39           40                8
40           41                9
41           42                9
42           43                9
43           44                9
44           45                9
45           46               10
46           47               10
47           48               10
48           49               10
49           50               10

如何根据不同的条件为 pandas dataframe 中的特定列赋值？

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-02-26 05:16:54

如何根据不同的条件为 pandas dataframe 中的特定列赋值？

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-02-26 05:16:54

解决方案1
2 已采纳 2021-02-26 05:16:54