pandas stacking dataframe reshapes data

Question

I'm trying to stack two 3 column data frames using either concat, append, or merge. The result is a 5 column dataframe where the original columns have a different order in places. Here are some of the things I've tried:

dfTrain = pd.read_csv("agr_hi_train.csv")
dfTrain2 = pd.read_csv("english/agr_en_train.csv")
dfTrain2.reset_index()
frames = [dfTrain, dfTrain2]
test = dfTrain2.append(dfTrain, ignore_index=True)
test2 = dfTrain2.append(dfTrain)
test3 = pd.concat(frames, axis=0, ignore_index=True)
test4 = pd.merge(dfTrain,dfTrain2, right_index=True, left_index=True)

With the following results:

print(dfTrain.shape)
print(dfTrain2.shape)
print(test.shape)
print(test2.shape)
print(test3.shape)
print(test4.shape)

Output is:

(20198, 5) (20198, 5) (11998, 6) (8200, 6) (8200, 3) (11998, 3)

I want the result to be:

(20198,3) # ie last two stacked on top of each other. . . Any ideas why I'm getting the extra columns, etc.?

Answer 1

If you have different column names, then your append will separate the columns. For example:

dfTrain = pd.DataFrame(np.random.rand(8200, 3), columns=['A', 'B', 'C'])
dfTrain2 = pd.DataFrame(np.random.rand(11998, 3), columns=['D', 'E', 'F'])
test = dfTrain.append(dfTrain2)
print(test)

has the output:

          A         B         C         D         E         F
0      0.617294  0.507264  0.330792       NaN       NaN       NaN
1      0.439806  0.355340  0.757864       NaN       NaN       NaN
2      0.740674  0.332794  0.530613       NaN       NaN       NaN
...
20195       NaN       NaN       NaN  0.295392  0.621741  0.255251
20196       NaN       NaN       NaN  0.096586  0.841174  0.392839
20197       NaN       NaN       NaN  0.071756  0.998280  0.451681

If you rename the columns in both dataframes to match, then it'll line up.

dfTrain2.columns = ['A','B','C']
test2 = dfTrain.append(dfTrain2)
print(test2)

          A         B         C
0      0.545936  0.103332  0.939721
1      0.258807  0.274423  0.262293
2      0.374780  0.458810  0.955040
...
[20198 rows x 3 columns]

pandas stacking dataframe reshapes data

Question

1 answers

solution1
0 2018-04-27 16:54:39

pandas stacking dataframe reshapes data

Question

1 answers

solution1 0 2018-04-27 16:54:39

solution1
0 2018-04-27 16:54:39