[英]Generating a new dataframe from existing columns
I am trying to create a new column D from existing A, B, C column.我正在尝试从现有的 A、B、C 列创建一个新的 D 列。
following this pattern, the rest of the values in Col D is iteration of this pattern.遵循此模式,Col D 中的值的 rest 是此模式的迭代。 Please reference the graph.
请参考图表。
Any code ideas?任何代码想法?
Click the link to see the picture点击链接查看图片
Here's a solution (in a couple of steps for better clarity):这是一个解决方案(为了更清楚,需要几个步骤):
df = pd.DataFrame({"a": range(0, 10), "b": range(10, 20), "c": range(20, 30)})
df["inx"] = range(len(df))
df["d"] = np.where(df.inx % 3 == 0, df["a"],
np.where(df.inx % 3 == 1, df["b"], df["c"]))
df = df.drop("inx", axis="columns")
Result:结果:
a b c d
0 0 10 20 0
1 1 11 21 11
2 2 12 22 22
3 3 13 23 3
4 4 14 24 14
5 5 15 25 25
6 6 16 26 6
7 7 17 27 17
8 8 18 28 28
9 9 19 29 9
We can use, pd.concat
to create a dataframe having the columns A
, B
, C
with the values in the columns are filled accordingly by consecutively iterating through the columns and selecting the next value from that column as per the conditions then use DataFrame.agg
to agg the dataframe by droping the NaN
values: We can use,
pd.concat
to create a dataframe having the columns A
, B
, C
with the values in the columns are filled accordingly by consecutively iterating through the columns and selecting the next value from that column as per the conditions then use DataFrame.agg
通过删除NaN
值来 agg dataframe:
d = pd.concat([df[col].iloc[i::df.columns.size] for i, col in enumerate(df.columns)], axis=1)
df['D'] = d.agg(lambda s: s.dropna().iloc[0], axis=1)
Result:结果:
# print(df)
A B C D
0 A1 B1 C1 A1
1 A2 B2 C2 B2
2 A3 B3 C3 C3
3 A4 B4 C4 A4
4 A5 B5 C5 B5
5 A6 B6 C6 C6
Here's another way:这是另一种方式:
import pandas as pd
import numpy as np
#Create input dataframe:
df = pd.DataFrame(index=[*'123456'], columns=[*'ABC'])
df = df.apply(lambda x: x.name+x.index)
df
Input Dataframe:输入 Dataframe:
A B C
1 A1 B1 C1
2 A2 B2 C2
3 A3 B3 C3
4 A4 B4 C4
5 A5 B5 C5
6 A6 B6 C6
Use cumsum
with lookup
:将
cumsum
与lookup
一起使用:
s = ((df['A'].notna().cumsum()-1) % df.shape[1])
df['d'] = df.lookup(df.index, df.columns[s])
df
Output: Output:
A B C d
1 A1 B1 C1 A1
2 A2 B2 C2 B2
3 A3 B3 C3 C3
4 A4 B4 C4 A4
5 A5 B5 C5 B5
6 A6 B6 C6 C6
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.