简体   繁体   中英

python csv copy column

I have a file containing following

first_name,last_name,uid,email,dep_code,dep_name
john,smith,jsmith,jsmith@gmail.com,finance,21230
john,king,jking,jjing@gmail.com,human resource,31230

I want to copy column "email" and create a new column "email2" and then replace gmail.com from column email2 to hotmail.com

I'm new to python so need help from experts, I tried few script, but if there is a better way to do it then please let me know. The original file contains 60000 rows.

with open('c:\\Python27\\scripts\\colnewfile.csv', 'rb') as fp_in1, open('c:\\Python27\\scripts\\final.csv', 'wb') as fp_out1:
    writer1 = csv.writer(fp_out1, delimiter=",")
    reader1 = csv.reader(fp_in1, delimiter=",")
    domain = "@hotmail.com"
    for row in reader1:
        if row[2:3] == "uid":
            writer1.append("Email2")
        else:
            writer1.writerow(row+[row[2:3]])

Here is the final script, only problem is that it does not complete the entire outfile, it only show 61409 rows, whereas in the input file there are 61438 rows.

inFile = 'c:\\Python27\\scripts\\in-093013.csv' outFile = 'c:\\Python27\\scripts\\final.csv'

with open(inFile, 'rb') as fp_in1, open(outFile, 'wb') as fp_out1: writer = csv.writer(fp_out1, delimiter=",") reader = csv.reader(fp_in1, delimiter=",") for col in reader: del col[6:] writer.writerow(col) headers = next(reader) writer.writerow(headers + ['email2']) for row in reader: if len(row) > 3: email = email.split('@', 1)[0] + '@hotmail.com' writer.writerow(row + [email])

If you call next() on the reader you get one row at at a time; use that to copy over the headers. Copying the email column is easy enough:

import csv

infilename = r'c:\Python27\scripts\colnewfile.csv'
outfilename = r'c:\Python27\scripts\final.csv'

with open(infilename, 'rb') as fp_in, open(outfilename, 'wb') as fp_out:
    reader = csv.reader(fp_in, delimiter=",")
    headers = next(reader)  # read first row

    writer = csv.writer(fp_out, delimiter=",")
    writer.writerow(headers + ['email2'])

    for row in reader:
        if len(row) > 3:
            # make sure there are at least 4 columns
            email = row[3].split('@', 1)[0] + '@hotmail.com'
        writer.writerow(row + [email])

This code splits the email address on the first @ sign, takes the first part of the split and adds @hotmail.com after it:

>>> 'example@gmail.com'.split('@', 1)[0]
'example'
>>> 'example@gmail.com'.split('@', 1)[0] + '@hotmail.com'
'example@hotmail.com'

The above produces:

first_name,last_name,uid,email,dep_code,dep_name,email2
john,smith,jsmith,jsmith@gmail.com,finance,21230,jsmith@hotmail.com
john,king,jking,jjing@gmail.com,human resource,31230,jjing@hotmail.com

for your sample input.

This can be done very cleanly using pandas . Here it goes:

In [1]: import pandas as pd

In [3]: df = pd.read_csv('your_csv_file.csv')

In [4]: def rename_email(row):
   ...:     return row.email.replace('gmail.com', 'hotmail.com')
   ...:

In [5]: df['email2'] = df.apply(rename_email, axis=1)

In [6]: """axis = 1 or ‘columns’: apply function to each row"""

In [7]: df
Out[7]:
  first_name last_name     uid             email        dep_code  dep_name              email2
0       john     smith  jsmith  jsmith@gmail.com         finance     21230  jsmith@hotmail.com
1       john      king   jking   jjing@gmail.com  human resource     31230   jjing@hotmail.com

In [8]: df.to_csv('new_update_email_file.csv')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM