I'm trying to stack two 3 column data frames using either concat, append, or merge. The result is a 5 column dataframe where the original columns have a different order in places. Here are some of the things I've tried:
dfTrain = pd.read_csv("agr_hi_train.csv")
dfTrain2 = pd.read_csv("english/agr_en_train.csv")
dfTrain2.reset_index()
frames = [dfTrain, dfTrain2]
test = dfTrain2.append(dfTrain, ignore_index=True)
test2 = dfTrain2.append(dfTrain)
test3 = pd.concat(frames, axis=0, ignore_index=True)
test4 = pd.merge(dfTrain,dfTrain2, right_index=True, left_index=True)
With the following results:
print(dfTrain.shape)
print(dfTrain2.shape)
print(test.shape)
print(test2.shape)
print(test3.shape)
print(test4.shape)
Output is:
(20198, 5) (20198, 5) (11998, 6) (8200, 6) (8200, 3) (11998, 3)
I want the result to be:
(20198,3) # ie last two stacked on top of each other. . . Any ideas why I'm getting the extra columns, etc.?
If you have different column names, then your append will separate the columns. For example:
dfTrain = pd.DataFrame(np.random.rand(8200, 3), columns=['A', 'B', 'C'])
dfTrain2 = pd.DataFrame(np.random.rand(11998, 3), columns=['D', 'E', 'F'])
test = dfTrain.append(dfTrain2)
print(test)
has the output:
A B C D E F
0 0.617294 0.507264 0.330792 NaN NaN NaN
1 0.439806 0.355340 0.757864 NaN NaN NaN
2 0.740674 0.332794 0.530613 NaN NaN NaN
...
20195 NaN NaN NaN 0.295392 0.621741 0.255251
20196 NaN NaN NaN 0.096586 0.841174 0.392839
20197 NaN NaN NaN 0.071756 0.998280 0.451681
If you rename the columns in both dataframes to match, then it'll line up.
dfTrain2.columns = ['A','B','C']
test2 = dfTrain.append(dfTrain2)
print(test2)
A B C
0 0.545936 0.103332 0.939721
1 0.258807 0.274423 0.262293
2 0.374780 0.458810 0.955040
...
[20198 rows x 3 columns]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.