[英]add a column to data frame using pandas concatenation
I have "train_df" data frame which:我有“train_df”数据框,其中:
print(train_df.shape)
returns (997, 600).返回 (997, 600)。
now I want to concatenate a column to this data frame which:现在我想将一列连接到此数据框,其中:
print(len(local_df["target"]))
returns 997.返回 997。
so it seems that everything is ok with the dimensions.所以看起来尺寸一切正常。
but the problem is that:但问题是:
final_df = pd.concat([train_df, local_df["target"]], axis=1)
print(final_df.shape)
returns (1000, 601).返回 (1000, 601)。 while it should be (997, 601).
而它应该是 (997, 601)。
Do you know what is the problem?你知道问题出在哪里吗?
I think problem is with different index values, so solution is create same by reset_index
with parameter drop=True
:我认为问题出在不同的索引值上,所以解决方案是通过
reset_index
和参数drop=True
创建相同的:
final_df = pd.concat([train_df.reset_index(drop=True),
local_df["target"].reset_index(drop=True)], axis=1)
print(final_df.shape)
Or set index of local_df
by train_df.index
:或者通过
train_df.index
设置local_df
的train_df.index
:
final_df = pd.concat([train_df,
local_df["target"].set_index(train_df.index)], axis=1)
print(final_df.shape)
You can assign
a numpy array as a new column.您可以
assign
numpy 数组assign
为新列。
final_df = train_df.assign(target=local_df["target"].values)
For pandas >= 0.24,对于 >= 0.24 的熊猫,
final_df = train_df.assign(target=local_df["target"].to_numpy())
How about join?:加盟怎么样?:
import pandas as pd
df=pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
df2=pd.DataFrame({'c':[232,543,562]})
print(df.reset_index(drop=True).join(df2.reset_index(drop=True), how='left'))
Output:输出:
a b c
0 1 4 232
1 2 5 543
2 3 6 562
Not sure if this is most efficient不确定这是否最有效
Adding a new column y
to a dataframe df
from another dataframe df2
which has this column y
从另一个具有此列
y
的 dataframe df2
向 dataframe df
添加一个新列y
df = df.assign(y=df2["y"].reset_index(drop=True))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.