简体   繁体   English

大熊猫堆叠数据框重塑数据

[英]pandas stacking dataframe reshapes data

I'm trying to stack two 3 column data frames using either concat, append, or merge. 我正在尝试使用concat,append或merge堆叠两个3列数据帧。 The result is a 5 column dataframe where the original columns have a different order in places. 结果是一个5列数据框,其中原始列的位置顺序不同。 Here are some of the things I've tried: 这是我尝试过的一些方法:

dfTrain = pd.read_csv("agr_hi_train.csv")
dfTrain2 = pd.read_csv("english/agr_en_train.csv")
dfTrain2.reset_index()
frames = [dfTrain, dfTrain2]
test = dfTrain2.append(dfTrain, ignore_index=True)
test2 = dfTrain2.append(dfTrain)
test3 = pd.concat(frames, axis=0, ignore_index=True)
test4 = pd.merge(dfTrain,dfTrain2, right_index=True, left_index=True)

With the following results: 结果如下:

print(dfTrain.shape)
print(dfTrain2.shape)
print(test.shape)
print(test2.shape)
print(test3.shape)
print(test4.shape)

Output is: 输出为:

(20198, 5) (20198, 5) (11998, 6) (8200, 6) (8200, 3) (11998, 3) (20198,5)(20198,5)(11998,6)(8200,6)(8200,3)(11998,3)

I want the result to be: 我希望结果是:

(20198,3) # ie last two stacked on top of each other. (20198,3)#即最后两个堆叠在一起。 . . Any ideas why I'm getting the extra columns, etc.? 有什么想法为什么我要增加额外的列等吗?

If you have different column names, then your append will separate the columns. 如果您使用不同的列名,那么您的附录将分隔各列。 For example: 例如:

dfTrain = pd.DataFrame(np.random.rand(8200, 3), columns=['A', 'B', 'C'])
dfTrain2 = pd.DataFrame(np.random.rand(11998, 3), columns=['D', 'E', 'F'])
test = dfTrain.append(dfTrain2)
print(test)

has the output: 具有输出:

          A         B         C         D         E         F
0      0.617294  0.507264  0.330792       NaN       NaN       NaN
1      0.439806  0.355340  0.757864       NaN       NaN       NaN
2      0.740674  0.332794  0.530613       NaN       NaN       NaN
...
20195       NaN       NaN       NaN  0.295392  0.621741  0.255251
20196       NaN       NaN       NaN  0.096586  0.841174  0.392839
20197       NaN       NaN       NaN  0.071756  0.998280  0.451681

If you rename the columns in both dataframes to match, then it'll line up. 如果您将两个数据框中的列重命名为匹配的,则它将对齐。

dfTrain2.columns = ['A','B','C']
test2 = dfTrain.append(dfTrain2)
print(test2)

          A         B         C
0      0.545936  0.103332  0.939721
1      0.258807  0.274423  0.262293
2      0.374780  0.458810  0.955040
...
[20198 rows x 3 columns]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM