简体   繁体   English

在备用列上对齐日期Pandas Dataframe

[英]Align dates on alternate columns Pandas Dataframe

I have a Pandas Dataframe, the columns 1-3-5-7...contain dates, the columns 2-4-6-8-.. contain data values. 我有一个Pandas Dataframe,第1-3-5-7列包含日期,第2-4-6-8-列包含数据值。 The dates in the columns do not correspond. 列中的日期不对应。 I want a single column containing all dates and the remaining columns containing just values Example: 我想要一个包含所有日期的列,其余的列仅包含值,例如:

input

      date val1       date   val2        date val3 
2007-12-01 35.6  2007-12-05 101.1  2007-12-05 89.1
2007-12-02 36.7. 2007-12-06 102.3  2007-12-07 89.3
2007-12-05 36.7  2007-12-07 108.3. 2007-12-08 89.5
2007-12-06 36.9  2007-12-08 110.0  2007-12-09 89.3
2007-12-07 36.9. 2007-12-09 102.3  2007-12-12 89.9

output

      date   val1   val2   val3 
2007-12-01   35.6     na     na 
2007-12-02   36.7     na     na 
2007-12-05   36.7  101.1   89.1 
2007-12-06   36.9  102.3     na 
2007-12-07   36.9  108.3   89.3 
2007-12-08     na  110.0   89.5
2007-12-09     na  102.3   89.3
2007-12-12     na     na   89.9

You can iteratively join all the couple of columns into a new empty dataframe. 您可以迭代将所有两列连接到一个新的空数据框中。

dft = pd.DataFrame({"date": []})
N = len(df.columns)
for n in range(N // 2):
    dft = dft.merge(df.iloc[:, 2*n:2*(n+1)], on='date', how='outer')

Notice that we define an empty column date to merge on it the first iteration. 请注意,我们定义了一个空列日期以在第一次迭代时在其上合并。 The 'outer' key says that all the values coming both from the left (initial) and right (merged) dataframe are to be kept, and nans added where needed. 'outer'键表示必须保留来自左侧(初始)和右侧(合并)数据帧的所有值,并在需要nans添加nans Hope this helps. 希望这可以帮助。

You can try so(It can happen that columns with same name are renamed): 您可以尝试这样做(可能会重命名具有相同名称的列):

df:
         date   val1      date.1    val2      date.2  val3
0  2007-12-01   35.6  2007-12-05   101.1  2007-12-05  89.1
1  2007-12-02  36.7.  2007-12-06   102.3  2007-12-07  89.3
2  2007-12-05   36.7  2007-12-07  108.3.  2007-12-08  89.5
3  2007-12-06   36.9  2007-12-08   110.0  2007-12-09  89.3
4  2007-12-07  36.9.  2007-12-09   102.3  2007-12-12  89.9

for index, i in enumerate(xrange(0,len(df.columns),2)):
    col = df.columns[i]
    name = 'df' + str(index)
    name = df.iloc[:,i:i+2]
    if index == 0:
        dft = name
    name.columns = ['date', ('value' + str(i/2+1))]
    if index !=0:
        dft = dft.merge(name, on='date', how='outer')
print dft

Output: 输出:

         date value1  value2  value3
0  2007-12-01   35.6     NaN     NaN
1  2007-12-02  36.7.     NaN     NaN
2  2007-12-05   36.7   101.1    89.1
3  2007-12-06   36.9   102.3     NaN
4  2007-12-07  36.9.  108.3.    89.3
5  2007-12-08    NaN   110.0    89.5
6  2007-12-09    NaN   102.3    89.3
7  2007-12-12    NaN     NaN    89.9

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM