简体   繁体   中英

How to extract data from multiple files which is inside another file using python?

I have one CSV file name"sample.csv" and it has some columns:

id  name      contactno. fileName
1   robin      1234455    info_v1.csv
2   elsa,roy   42342442   info_v1.csv
3   john       232323     info_v2.csv

The files mentioned in the fileName column contains some text message.

info_v1.csv

id  message
1   my name is robin. I am from Berlin
2   my name is Elsa. My age is 12

info_v2.csv

id  message
3   my name is John.I play football.

Now I want to create one file which will contains all the information. For example:

output.csv

id  name       contactno. message
1   robin      1234455    my name is robin. I am from Berlin
2   elsa,roy   42342442   my name is Elsa. My age is 12
3   john       232323     my name is John.I play football.

what I have done till now is given below:

csvfile = csv.reader(open('sample.csv', newline='',encoding="utf8"))
next(csvfile)
included_cols = [0, 1, 2, 3]
for row in csvfile:
    content = list(row[i] for i in included_cols)
    filename=content[3]
        
    with open(filename, newline='') as f:
        
        reader = csv.reader(f)
        next(reader)
        for row1 in reader:
            
            content1 = row1
        
    my_list1=content+content1

but here I am not able to keep all the information together to create output.CSV and how can I match the id so that it will not fetch the wrong data from info.csv?

Create a function that read the file mentioned in sample.csv and then comapre the id and returns the corresponding message

import csv

def get_message(name, id):
    with open(name) as fp:
        reader = csv.DictReader(fp)
        for row in reader:
            if row['id'] == id:
                return row['message']

with open('sample.csv') as fp, open('output.csv', 'w') as fw:
    reader = csv.reader(fp)
    writer = csv.writer(fw)
    columns = next(reader)[:-1] + ['message']
    writer.writerow(columns)
    for row in reader:
        new_row = row[:-1] + [get_message(row[-1], row[0])]
        writer.writerow(new_row)

Output:

 id      name  contactno.                             message
  1     robin     1234455  my name is robin. I am from Berlin
  2  elsa,roy    42342442       my name is Elsa. My age is 12
  3      john      232323    my name is John.I play football.

Here is a different approach, which uses pandas :

import numpy as np
import pandas as pd

df = pd.read_csv('sample.csv')
files = df['fileName'].unique()

for f in files:
    df = df.merge(pd.read_csv(f), on='id', how='left')

df['message'] = np.where(df['message_x'].isnull(), df['message_y'], df['message_x'])
df.drop(columns=['message_x', 'message_y', 'fileName'], inplace=True)
df.to_csv('output.csv', index=False)

Output:

id  name        contactno.  message
1   robin       1234455     my name is robin. I am from Berlin
2   elsa,roy    42342442    my name is Elsa. My age is 12
3   john        232323

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM