简体   繁体   中英

Copying one column of a CSV file and adding it to another file using python

I have two files, the first one is called book1.csv, and looks like this:

 header1,header2,header3,header4,header5
 1,2,3,4,5
 1,2,3,4,5
 1,2,3,4,5

The second file is called book2.csv, and looks like this:

 header1,header2,header3,header4,header5
 1,2,3,4
 1,2,3,4
 1,2,3,4

My goal is to copy the column that contains the 5's in book1.csv to the corresponding column in book2.csv.

The problem with my code seems to be that it is not appending right nor is it selecting just the index that I want to copy.It also gives an error that I have selected an incorrect index position. The output is as follows:

 header1,header2,header3,header4,header5
 1,2,3,4
 1,2,3,4
 1,2,3,41,2,3,4,5

Here is my code:

 import csv

 with open('C:/Users/SAM/Desktop/book2.csv','a') as csvout:
    write=csv.writer(csvout, delimiter=',')
    with open('C:/Users/SAM/Desktop/book1.csv','rb') as csvfile1:
        read=csv.reader(csvfile1, delimiter=',')
        header=next(read)
        for row in read:
            row[5]=write.writerow(row)

What should I do to get this to append properly?

Thanks for any help!

What about something like this. I read in both books, append the last element of book1 to the book2 row for every row in book2 , which I store in a list. Then I write the contents of that list to a new .csv file.

with open('book1.csv', 'r') as book1:
    with open('book2.csv', 'r') as book2:
        reader1 = csv.reader(book1, delimiter=',')
        reader2 = csv.reader(book2, delimiter=',')

        both = []
        fields = reader1.next() # read header row
        reader2.next() # read and ignore header row
        for row1, row2 in zip(reader1, reader2):
            row2.append(row1[-1])
            both.append(row2)

        with open('output.csv', 'w') as output:
            writer = csv.writer(output, delimiter=',')
            writer.writerow(fields) # write a header row
            writer.writerows(both)

Regarding the "error that I have selected an incorrect index position," I suspect this is because you're using row[5] in your code. Indexing in Python starts from 0, so if you have A = [1, 2, 3, 4, 5] then to get the 5 you would do print(A[4]) .

Assuming the two files have the same number of rows and the rows are in the same order, I think you want to do something like this:

import csv
# Open the two input files, which I've renamed to be more descriptive,
# and also an output file that we'll be creating
with open("four_col.csv", mode='r') as four_col, \
     open("five_col.csv", mode='r') as five_col, \
     open("five_output.csv", mode='w', newline='') as outfile:
  four_reader = csv.reader(four_col)
  five_reader = csv.reader(five_col)
  five_writer = csv.writer(outfile)
  _ = next(four_reader) # Ignore headers for the 4-column file
  headers = next(five_reader)
  five_writer.writerow(headers)
  for four_row, five_row in zip(four_reader, five_reader):
    last_col = five_row[-1] # # Or use five_row[4]
    four_row.append(last_col)
    five_writer.writerow(four_row)

Although some of the code above will work it is not really scalable and a vectorised approach is needed. Getting to work with numpy or pandas will make some of these tasks easier so it is great to learn a bit of it.

You can download pandas from the Pandas Website

# Load Pandas
from pandas import DataFrame

# Load each file into a pandas dataframe, this is based on a numpy array
data1 = DataFrame.from_csv('csv1.csv',sep=',',parse_dates=False)
data2 = DataFrame.from_csv('csv2.csv',sep=',',parse_dates=False)

#Now add 'header5' from data1 to data2
data2['header5'] = data1['header5']

#Save it back to csv
data2.to_csv('output.csv')

Why not reading the files line by line and use the -1 index to find the last item?

endings=[]

with open('book1.csv') as book1:
   for line in book1:        
      # if not header line:  
      endings.append(line.split(',')[-1])

linecounter=0
with open('book2.csv') as book2:
   for line in book2:
      # if not header line:
      print line+','+str(endings[linecounter]) # or write to file
      linecounter+=1

You should also catch errors if row numbers don't match.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM