[英]Pandas importing column data in incorrect order into data frame
Three sheets are imported correctly then the format problem occurs.正确导入了三张纸,然后出现格式问题。 Three sheets are then imported using this improper format.
然后使用这种不正确的格式导入三张纸。 Then the problem occurs again by shifting the data in a similar way.
然后通过以类似方式移动数据再次出现问题。 So basically every forth sheet that is imported appears to pull the data from columns out of order.
因此,基本上每四张导入的工作表似乎都会乱序地从列中提取数据。
Output returns as expected: Output 按预期返回:
The problem occurs when it moves to the next sheet, even though it's column formatting is the same as the last.当它移动到下一张纸时会出现问题,即使它的列格式与上一张相同。
It appears to pull M:P correctly, then it jumbles the data by appearing to pull in this order: AC:AD, S:Z wile adding five extra blank columns, Q:R, AB:AC.它似乎正确地拉取了 M:P,然后它似乎按以下顺序拉取了数据:AC:AD、S:Z 并添加了五个额外的空白列、Q:R、AB:AC。
The only difference in the two worksheets is that the first has data in more columns than the second however, both have the save number of columns being queried.两个工作表的唯一区别是第一个工作表的数据列数比第二个工作表多,但是,两者都保存了查询的列数。
df1 = [pd.read_excel(xls, sheet_name=s, skiprows=4, nrows=32, usecols='M:AD') for s in main]
dfconcat = pd.concat(df1, ignore_index=True, sort=False)
dfconcat.dropna(axis=0, how='all', inplace=True)
writer = pd.ExcelWriter(f'{loc}/test.xlsx')
dfconcat.to_excel(writer, 'bananas', index=False, header=False, na_rep='', merge_cells=False)
writer.save()
Since it occurs every fourth sheet, I assume there is something incorrect in my code, or something to add to it to reset something in pandas after every pass.因为它每四张纸出现一次,我假设我的代码中有一些不正确的东西,或者在每次通过后要添加一些东西来重置 pandas 中的东西。 Any guidance would be appreciated.
任何指导将不胜感激。
Add header=None
at the end inside pd.read_excel
.在
pd.read_excel
的末尾添加header=None
。 By default, read_excel
will use the first row ( header=0
) as the header. Ie in your case, in view of skiprows=4
, ROW 5:5
in each sheet will be interpreted as the header.默认情况下,
read_excel
将使用第一行 ( header=0
) 作为 header。即在您的情况下,鉴于skiprows=4
,每张工作表中的ROW 5:5
将被解释为 header。
This causes problems, when you usepd.concat
.当您使用
pd.concat
时,这会导致问题。 Eg if you have pd.concat([d1,d2])
and d1
has cols A, B
, but d2
has cols B, A
, then the result will actually have order A, B
, following the first df.例如,如果您有
pd.concat([d1,d2])
并且d1
有 cols A, B
,但d2
有 cols B, A
,那么结果实际上有顺序A, B
,在第一个 df 之后。 Hence, the "shift" of the columns.因此,列的“移位”。
So, basically, you end up doing something like this:所以,基本上,你最终会做这样的事情:
dfs = [pd.DataFrame({'a':[1],'b':[2]}),
pd.DataFrame({'b':[1],'a':[2]})]
print(pd.concat(dfs, ignore_index=True, sort=False))
a b
0 1 2
1 2 1
While you actually want to do:虽然你真的想做:
dfs = [pd.DataFrame([{0: 'a', 1: 'b'}, {0: 1, 1: 2}]),
pd.DataFrame([{0: 'b', 1: 'a'}, {0: 1, 1: 2}])]
print(pd.concat(dfs, ignore_index=True, sort=False))
0 1
0 a b
1 1 2
2 b a
3 1 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.