[英]Using Pandas to join and append columns in a loop
I want to append columns from tables generated in a loop to a dataframe. 我想将循环中生成的表中的列追加到数据框。 I was hoping to accomplish this using pandas.merge
, but it doesn't seem to be working out for me. 我希望使用pandas.merge
完成此操作,但它似乎对我没有效果。
My code: 我的代码:
from datetime import date
from datetime import timedelta
import pandas
import numpy
import pyodbc
date1 = date(2017, 1, 1) #Starting Date
date2 = date(2017, 1, 10) #Ending Date
DateDelta = date2 - date1
DateAdd = DateDelta.days
StartDate = date1
count = 1
# Create the holding table
conn = pyodbc.connect('Server Information')
**basetable = pandas.read_sql("SELECT....")
while count <= DateAdd:
print(StartDate)
**datatable = pandas.read_sql("SELECT...WHERE Date = "+str(StartDate)+"...")
finaltable = basetable.merge(datatable,how='left',left_on='OrganizationName',right_on='OrganizationName')
StartDate = StartDate + timedelta(days=1)
count = count + 1
print(finaltable)
Shortened the select statements for brevity's sake, but the tables produced look like this: 为简洁起见,缩短了选择语句,但生成的表如下所示:
**Basetable **基准
School_District
---------------
District_Alpha
District_Beta
...
District_Zed
**Datatable **数据表
School_District|2016-01-01|
---------------|----------|
District_Alpha | 400 |
District_Beta | 300 |
... | 200 |
District_Zed | 100 |
I have the datatable written so the column takes the name of the date selected for that particular loop, so column names can be unique once i get this up and running. 我已编写了数据表,因此该列采用为该特定循环选择的日期的名称,因此一旦启动并运行该列名称就可以是唯一的。 My problem, however, is that the above code only produces one column of data. 但是,我的问题是上面的代码仅产生一列数据。 I have a good guess as to why: Only the last merge is being processed - I thought using pandas.append
would be the way to get around that, but pandas.append
doesn't "join" like merge does. 我有一个很好的猜测,为什么:只有最后一次合并正在处理-我认为使用pandas.append
可以解决该问题,但是pandas.append
不会像merge那样“加入”。 Is there some other way to accomplish a sort of Join & Append using Pandas? 还有其他方法可以使用Pandas完成某种Join&Append吗? My goal is to keep this flexible so that other dates can be easily input depending on our data needs. 我的目标是保持这种灵活性,以便可以根据我们的数据需求轻松输入其他日期。
In the end, what I want to see is: 最后,我想看到的是:
School_District|2016-01-01|2016-01-02|... |2016-01-10|
---------------|----------|----------|-----|----------|
District_Alpha | 400 | 1 | | 45 |
District_Beta | 300 | 2 | | 33 |
... | 200 | 3 | | 5435 |
District_Zed | 100 | 4 | | 333 |
Your error is in the statement finaltable = basetable.merge(datatable,...)
. 您的错误在于语句finaltable = basetable.merge(datatable,...)
。 At each loop iteration, you merge the original basetable
with the new datatable
, store the result in the finaltable
... and discard it. 在每次循环迭代,在合并原basetable
新datatable
,结果存储在finaltable
...并丢弃。 What you need is basetable = basetable.merge(datatable,...)
. 您需要的是basetable = basetable.merge(datatable,...)
。 No finaltable
s. 没有finaltable
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.