简体   繁体   English

从现有列生成新的 dataframe

[英]Generating a new dataframe from existing columns

I am trying to create a new column D from existing A, B, C column.我正在尝试从现有的 A、B、C 列创建一个新的 D 列。

  • first value in Col D is A1 Col D 中的第一个值为 A1
  • second value in Col D is B2 Col D 中的第二个值是 B2
  • third value in Col D is C3 Col D 中的第三个值是 C3
  • fourth value in Col D is A4 Col D 中的第四个值是 A4
  • fifth value in Col D is B5 Col D 中的第五个值是 B5
  • sixth value in Col D is C6 Col D 中的第六个值是 C6

following this pattern, the rest of the values in Col D is iteration of this pattern.遵循此模式,Col D 中的值的 rest 是此模式的迭代。 Please reference the graph.请参考图表。

Any code ideas?任何代码想法?

Click the link to see the picture点击链接查看图片

在此处输入图像描述

Here's a solution (in a couple of steps for better clarity):这是一个解决方案(为了更清楚,需要几个步骤):

df = pd.DataFrame({"a": range(0, 10), "b": range(10, 20), "c": range(20, 30)})

df["inx"] = range(len(df))
df["d"] = np.where(df.inx % 3 == 0, df["a"], 
         np.where(df.inx % 3 == 1, df["b"], df["c"]))
df = df.drop("inx", axis="columns")

Result:结果:

   a   b   c   d
0  0  10  20   0
1  1  11  21  11
2  2  12  22  22
3  3  13  23   3
4  4  14  24  14
5  5  15  25  25
6  6  16  26   6
7  7  17  27  17
8  8  18  28  28
9  9  19  29   9

We can use, pd.concat to create a dataframe having the columns A , B , C with the values in the columns are filled accordingly by consecutively iterating through the columns and selecting the next value from that column as per the conditions then use DataFrame.agg to agg the dataframe by droping the NaN values: We can use, pd.concat to create a dataframe having the columns A , B , C with the values in the columns are filled accordingly by consecutively iterating through the columns and selecting the next value from that column as per the conditions then use DataFrame.agg通过删除NaN值来 agg dataframe:

d = pd.concat([df[col].iloc[i::df.columns.size] for i, col in enumerate(df.columns)], axis=1)
df['D'] = d.agg(lambda s: s.dropna().iloc[0], axis=1)

Result:结果:

# print(df)
    A   B   C   D
0  A1  B1  C1  A1
1  A2  B2  C2  B2
2  A3  B3  C3  C3
3  A4  B4  C4  A4
4  A5  B5  C5  B5
5  A6  B6  C6  C6

Here's another way:这是另一种方式:

import pandas as pd
import numpy as np

#Create input dataframe:
df = pd.DataFrame(index=[*'123456'], columns=[*'ABC'])
df = df.apply(lambda x: x.name+x.index)
df

Input Dataframe:输入 Dataframe:

    A   B   C
1  A1  B1  C1
2  A2  B2  C2
3  A3  B3  C3
4  A4  B4  C4
5  A5  B5  C5
6  A6  B6  C6

Use cumsum with lookup :cumsumlookup一起使用:

s = ((df['A'].notna().cumsum()-1) % df.shape[1])
df['d'] = df.lookup(df.index, df.columns[s])
df

Output: Output:

    A   B   C   d
1  A1  B1  C1  A1
2  A2  B2  C2  B2
3  A3  B3  C3  C3
4  A4  B4  C4  A4
5  A5  B5  C5  B5
6  A6  B6  C6  C6

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM