简体   繁体   English

列上的操作多个文件熊猫

[英]Operations on Columns multiple files Pandas

I am trying to perform a some arithmetic operations in Python Pandas and merge the result in one of the file. 我正在尝试在Python Pandas中执行一些算术运算并将结果合并到文件之一中。

Path_1: File_1.csv, File_2.csv, ....

This path has several file which are supposed to be increasing in time intervals. 此路径有几个文件,应该在时间间隔中增加。 with the following columns 与以下列

    File_1.csv    |  File_2.csv
    Nos,12:00:00  |  Nos,12:30:00

    123,1451         485,5464
    656,4544         456,4865
    853,5484         658,4584

Path_2: Master_1.csv

Nos,00:00:00
123,2000
485,1500
656,1000
853,2500
456,4500
658,5000

I am trying to read the n number of .csv files from Path_1 and compare the col[1] header timeseries with col[last] timeseries of Master_1.csv . 我正在尝试从Path_1读取n.csv文件,并将col[1]标头时间序列与Master_1.csv col[last]时间序列进行Master_1.csv

If Master_1.csv does not have that time it should create a new column with timeseries from path_1 .csv files and update the values with respect col['Nos'] while subtracting them from col[1] of Master_1.csv . 如果Master_1.csv没有那个时候应该创建从时间序列的新列path_1 .csv文件,对于更新值col['Nos']同时从他们减去col[1]Master_1.csv

If the col with time from path_1 file is present then look for col['Nos'] and then replace the NAN with the subtracted values respect to that col['Nos'] . 如果存在来自path_1 file带有时间的col ,则查找col['Nos'] ,然后将NAN替换为相对于该col['Nos']的减去值。

ie

Expected Output in Master_1.csv Master_1.csv中的预期输出

Nos,00:00:00,12:00:00,12:30:00,
    123,2000,549,NAN,
    485,1500,NAN,3964,
    656,1000,3544,NAN
    853,2500,2984,NAN
    456,4500,NAN,365
    658,5000,NAN,-416

I can understand the arithmetic calculations but I am not able to loop in with respect to Nos and timeseries I have tried to put some code together and trying to work around looping. 我可以理解算术计算,但是我无法就Nostimeseries进行循环,我试图将一些代码放在一起并尝试解决循环问题。 Need help in that context. 在这种情况下需要帮助。 Thanks 谢谢

import pandas as pd 
import numpy as np

path_1 = '/'
path_2 = '/'

df_1 = pd.read_csv(os.path_1('/.*csv'), Index=None, columns=['Nos', 'timeseries'] #times series is different in every file eg: 12:00, 12:30, 17:30 etc
df_2 = pd.read_csv('master_1.csv', Index=None, columns=['Nos', '00:00:00']) #00:00:00 time series

for Nos in df_1 and df_2:
    df_1['Nos'] = df_2['Nos']
    new_tseries = df_2['00:00:00'] - df_1['timeseries']

merged.concat('master_1.csv', Index=None, columns=['Nos', '00:00:00', 'new_tseries'], axis=0) # new_timeseries is the dynamic time series that every .csv file will have from path_1

You can do it in three steps 您可以分三步完成

  1. Read your csv's in to a list of dataframes 将您的csv读入数据框列表
  2. Merge the dataframes together (equivalent to a SQL left join or an Excel VLOOKUP 将数据框合并在一起(相当于SQL左联接或Excel VLOOKUP
  3. Calculate your derived columns using a vectorized subtraction. 使用矢量减法计算派生列。

Here's some code you could try: 您可以尝试以下代码:

#read dataframes into a list
import glob
L = []
for fname in glob.glob(path_1+'*.csv'):
   L.append(df.read_csv(fname))

#read master dataframe, and merge in other dataframes
df_2 = pd.read_csv('master_1.csv')
for df in L:
   df_2 = pd.merge(df_2,df, on = 'Nos', how = 'left')

#for each column, caluculate the difference with the master column
df_2.apply(lambda x: x - df_2['00:00:00'])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM