简体   繁体   中英

pandas stacking dataframe reshapes data

I'm trying to stack two 3 column data frames using either concat, append, or merge. The result is a 5 column dataframe where the original columns have a different order in places. Here are some of the things I've tried:

dfTrain = pd.read_csv("agr_hi_train.csv")
dfTrain2 = pd.read_csv("english/agr_en_train.csv")
dfTrain2.reset_index()
frames = [dfTrain, dfTrain2]
test = dfTrain2.append(dfTrain, ignore_index=True)
test2 = dfTrain2.append(dfTrain)
test3 = pd.concat(frames, axis=0, ignore_index=True)
test4 = pd.merge(dfTrain,dfTrain2, right_index=True, left_index=True)

With the following results:

print(dfTrain.shape)
print(dfTrain2.shape)
print(test.shape)
print(test2.shape)
print(test3.shape)
print(test4.shape)

Output is:

(20198, 5) (20198, 5) (11998, 6) (8200, 6) (8200, 3) (11998, 3)

I want the result to be:

(20198,3) # ie last two stacked on top of each other. . . Any ideas why I'm getting the extra columns, etc.?

If you have different column names, then your append will separate the columns. For example:

dfTrain = pd.DataFrame(np.random.rand(8200, 3), columns=['A', 'B', 'C'])
dfTrain2 = pd.DataFrame(np.random.rand(11998, 3), columns=['D', 'E', 'F'])
test = dfTrain.append(dfTrain2)
print(test)

has the output:

          A         B         C         D         E         F
0      0.617294  0.507264  0.330792       NaN       NaN       NaN
1      0.439806  0.355340  0.757864       NaN       NaN       NaN
2      0.740674  0.332794  0.530613       NaN       NaN       NaN
...
20195       NaN       NaN       NaN  0.295392  0.621741  0.255251
20196       NaN       NaN       NaN  0.096586  0.841174  0.392839
20197       NaN       NaN       NaN  0.071756  0.998280  0.451681

If you rename the columns in both dataframes to match, then it'll line up.

dfTrain2.columns = ['A','B','C']
test2 = dfTrain.append(dfTrain2)
print(test2)

          A         B         C
0      0.545936  0.103332  0.939721
1      0.258807  0.274423  0.262293
2      0.374780  0.458810  0.955040
...
[20198 rows x 3 columns]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM