简体   繁体   中英

Pandas/Python, Reading one row from multiple csv, and combining into 1 csv

I'm fairly new to Python and Pandas,etc (go easy), and have spent a couple full days so far researching how to do this, with no success. Here is the project.

Two different.CSV's for 2 different data points are created with a running total for the day in each (ending with the day's total for that specific data in the last row). For example, in 4_23_2022_Data1.csv , has the columns Date and Data1 , and in 4_23_2022_Data2.csv , has the columns Date and Data2 , with the last row in each.csv containing that day's Date (obviously) and that Data's total (either Data1 or Data2 column).

What I'm trying to accomplish is: Read a directory containing the 2 sets of Data.csv's (for multiple, many days), and create a combined.CSV with 'Date', 'Data1', Data2' columns, with the set of Data1 and Data2 displayed in its appropriate row for that specific date. [The primary system creating these.csv's cannot combine Data1 and Data2 in 1.csv initially, so I am trying to create a program that can quickly scan thru and do this.]

My initial coding attempts only scratch the surface (and do not account for any loop):

df_1 = pd.read_csv('04_23_2022_Data1.csv', usecols = ['Date','Data1'])

df_2 = pd.read_csv('04_23_2022_Data2.csv', usecols = ['Date','Data2'])

##the last line of the day is 538

Data1_date = df1_read.iloc[538]['Date']     

Data1_value = df1_read.iloc[538]['Data1']

Data2_date = df2_read.iloc[538]['Date'] 

Data2_value = df2_read.iloc[538]['Data2']

df_export = pd.DataFrame(columns = ['Date','Data1','Data2'])

df_export.at[1, 'Date'] = Data1_date

df_export.at[1, 'Data1'] = Data1_value

df_export.at[1, 'Data2'] = Data2_value

df_export.to_csv('Combo_of_Data1_and_Data2_per_Date.csv')

Any help or further guidance would be greatly appreciated. Again, sorry for the complete newbie attack at this problem, but I'm trying. :)

You can loop through your files in your folder with glob.glob() . And use Python built-in functions iter() and zip() to iterate through your files in pairs.

# import
import glob

# define folder path
# look for files ending in '.csv' only
path = r'\data_folder\*.csv'

# loop through folder and store all relevant files as a list
file_paths = glob.glob(path)
# iterate through list to get elements in pairs
iterable_list = iter(file_paths)
# zip pairs of elements
iterable_list = zip(iterable_list, iterable_list)

# loop through list of files in pairs and do stuff...
for file_one, file_two in pair_elements(file_paths):
    df_1 = pd.read_csv(file_one)
    df_2 = pd.read_csv(file_two)
    # do stuff

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM