简体   繁体   English

在 pandas dataframe 中,如何在不使用循环的情况下根据一列的数据设置其他列的值?

[英]In a pandas dataframe, how can I set the value of other columns based on the data from one column, without using a loop?

I'm trying to build a dataframe that will be used for linear regression.我正在尝试构建一个用于线性回归的 dataframe。 I would like to include 11 independent "dummy" variables that are set to either 1 or 0 based on the month of the year.我想包括 11 个独立的“虚拟”变量,这些变量根据一年中的月份设置为 1 或 0。 Without getting too far off topic, I'm using 11 variables instead of 12, as the 12th month is captured by the intercept.在不离题太远的情况下,我使用 11 个变量而不是 12 个变量,因为第 12 个月被截距捕获。

I know many things can be done with pandas without looping through the entire dataframe, and doing things in that manner are typically faster than using a loop.我知道用 pandas 可以完成很多事情,而无需遍历整个 dataframe,并且以这种方式做事通常比使用循环更快。

So, is it possible to grab the month from my date column, and dynamically set a seperate column to either a 1 or a 0 based on that month?那么,是否可以从我的日期列中获取月份,并根据该月份将单独的列动态设置为 1 或 0? Or am I asking a stupid question?还是我在问一个愚蠢的问题?

Edit: I should have included more information.编辑:我应该包含更多信息。 A dataframe is structured like this: dataframe 的结构如下:

Date日期 sku库存单位 units ordered订购单位 sessions会话 conversion rate兑换率
2020/01/30 2020/01/30 abc123 abc123 20 20 200 200 0.1 0.1
2020/01/31 2020/01/31 abc123 abc123 10 10 100 100 0.1 0.1
2020/02/01 2020/02/01 abc123 abc123 15 15 60 60 0.25 0.25

I would like to make it look like this:我想让它看起来像这样:

Date日期 sku库存单位 units ordered订购单位 sessions会话 conversion rate兑换率 january一月 february二月
2020/01/30 2020/01/30 abc123 abc123 20 20 200 200 0.1 0.1 1 1 0 0
2020/01/31 2020/01/31 abc123 abc123 10 10 100 100 0.1 0.1 1 1 0 0
2020/02/01 2020/02/01 abc123 abc123 15 15 60 60 0.25 0.25 0 0 1 1

The code I'm currently using to accomplish this is:我目前用来完成此操作的代码是:

x = 1
while x < 12:
    month = calendar.month_name[x]
    df[month] = 0
    x += 1

for index, row in df.iterrows():
    d = row[0]
    month = d.strftime("%B")
    if not month == "December":
        df.at[index, month] = 1

    df.fillna(0, inplace=True)

Just not sure if this is the best way to accomplish this.只是不确定这是否是实现这一目标的最佳方式。

My approach would be to first get the month number from every month using dt.month :我的方法是首先使用dt.month从每个月获取月份数:

df['Date'].dt.month
0    1
1    1
2    2
Name: Date, dtype: int64

Then use crosstab with the index to get the tabulation of the counts:然后使用带有索引的交叉表来获取计数列表:

pd.crosstab(
        df.index,
        df['Date'].dt.month
    )
Date   1  2
row_0      
0      1  0
1      1  0
2      0  1

Then merge back to the DF on index:然后合并回索引上的 DF:

df = (
    df.merge(pd.crosstab(
        df.index,
        df['Date'].dt.month
    ),
        left_index=True,
        right_index=True)
)

Output: Output:

        Date     sku  units ordered  sessions  conversion rate  1  2
0 2020-01-30  abc123             20       200             0.10  1  0
1 2020-01-31  abc123             10       100             0.10  1  0
2 2020-02-01  abc123             15        60             0.25  0  1

Finally, rename the columns using a mapper generated with the calendar api :最后,使用使用日历 api生成的映射器重命名列:

df = df.rename(columns={month_num: calendar.month_name[month_num]
                        for month_num in range(1, 13)})

All together:全部一起:

import pandas as pd
import calendar

df = pd.DataFrame(
    {'Date': {0: '2020/01/30', 1: '2020/01/31', 2: '2020/02/01'},
     'sku': {0: 'abc123', 1: 'abc123', 2: 'abc123'},
     'units ordered': {0: 20, 1: 10, 2: 15},
     'sessions': {0: 200, 1: 100, 2: 60},
     'conversion rate': {0: 0.1, 1: 0.1, 2: 0.25}})
df['Date'] = df['Date'].astype('datetime64[ns]')

df = (
    df.merge(pd.crosstab(
        df.index,
        df['Date'].dt.month
    ),
        left_index=True,
        right_index=True)
)

df = df.rename(columns={month_num: calendar.month_name[month_num]
                        for month_num in range(1, 13)})

print(df.to_string())

Output: Output:

        Date     sku  units ordered  sessions  conversion rate  January  February
0 2020-01-30  abc123             20       200             0.10        1         0
1 2020-01-31  abc123             10       100             0.10        1         0
2 2020-02-01  abc123             15        60             0.25        0         1

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据列值将数据从一个 Pandas 数据帧复制到另一个 - Copying data from one pandas dataframe to other based on column value 根据同一pandas数据框中的其他列为列分配值 - Assign value to a column based of other columns from the same pandas dataframe 如何在不使用where函数的情况下基于pandas DataFrame下的其他列有条件地选择列? - How to conditionally select column based on other columns under pandas DataFrame without using where function? 如何使用其他两列(熊猫)中的数据替换一列中的字符串 - How do I replace a string from one column using the data from two other columns (pandas) 如何根据其他 pandas 列和关联的字符串列的最大值创建新的 pandas 列? - How can I create a new pandas column based on the max value of other pandas columns and the associated string column? 根据列值将数据从一个熊猫数据帧复制到另一个熊猫数据帧并用逗号分隔 - Copying data from one pandas dataframe to other based on column value and separated by comma Pandas dataframe select 列基于其他 Z6A8064B5DF479455500553 列中的值47DC - Pandas dataframe select Columns based on other dataframe contains column value in it 如何使用一列值作为键从数据帧在 Pandas 中创建嵌套的 json? - How can i create a nested json in pandas from a dataframe using one column value as key? 如何将每列与Pandas DataFrame的其他列相乘? - How can I multiply each column with the other columns of the Pandas DataFrame? 通过根据列值熊猫数据框将一列置于另一列之下,将多列合并为一列 - Merge multiple columns into one by placing one below the other based on column value pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM