简体   繁体   中英

How to read every column of a csv file in python after every 10-15 rows which have the same header using pandas or csv?

I have a CSV file with data something like this in the below file.

Column 1| Car Make  |Mazda      |Car Model  |123456|
Customer| fname     |lname      | phone     |      |        
1       |abcdef     |iiiiii     |123456454                  
2       |eddfga     |jjjjjj     |123434325                  
3       |phadfa     |kkkk       |124141414                  
4       |adfkld     |llllll     |567575575                  
5       |dafadf     |mmmm       |898979778                  
                                
Column 2| Car Make| Nissan| Car Model| 789471|
Customer| fname   |lname  |phone                    
1       |gjjgjd   |pwldd  |7123097109                   
2       |rbnhf    |ggggg  |9827394829                   
3       |plgrd    |hhhhh  |9210313813                   
4       |ghurf    |jjjjj  |9876548311                   
5       |mjydd    |kkkkk  |8887654625                   
6       |kiure    |llllll |7775559994                   
7       |wriok    |mmmmm  |9993338881                   
8       |stije    |nnnnnn |1110002223                   
9       |ahlkg    |bbbbb  |7778889995                   
10      |cdegf    |vvvvv    |9993331999     

        

I need to read the CSV file(I'm using Pandas) to structure and combine all the columns along with the customer names and phone numbers with respect to their make and model. Output as the below file.

Column   |Customer| fname|  lname|  phone|  Car Make|   Car Model|
Column 1|   1|  abcdef| iiiiii| 123456454|  Mazda|  123456|
Column 1|   2|  eddfga| jjjjjj| 123434325|  Mazda|  123456|
Column 1|   3|  phadfa| kkkk|   124141414|  Mazda|  123456|
Column 1|   4|  adfkld| llllll| 567575575|  Mazda|  123456|
Column 1|   5|  dafadf| mmmm|   898979778|  Mazda|  123456|
Column 2|   1|  gjjgjd| pwldd|  7123097109| Nissan| 789471|
Column 2|   2|  rbnhf|  ggggg|  9827394829| Nissan| 789471|
Column 2|   3|  plgrd|  hhhhh|  9210313813| Nissan| 789471|
Column 2|   4|  ghurf|  jjjjj|  9876548311| Nissan| 789471|
Column 2|   5|  mjydd|  kkkkk|  8887654625| Nissan| 789471|
Column 2|   6|  kiure|  llllll| 7775559994| Nissan| 789471|
Column 2|   7|  wriok|  mmmmm|  9993338881| Nissan| 789471|
Column 2|   8|  stije|  nnnnnn| 1110002223| Nissan| 789471|
Column 2|   9|  ahlkg|  bbbbb|  7778889995| Nissan| 789471|
Column 2|   10| cdegf|  vvvvv|  9993331999| Nissan| 789471|

Below is the code that I have tried and I am not able to progress further on how to loop through column 2 and fetch the details after column 1 is parsed.

import pandas as pd

hdf = pd.read_csv("file.csv", nrows=0)
first = list(hdf)
df = pd.read_csv("file.csv", skiprows=1) 
df.columns = ['A','B','C','D']
out_df = pd.DataFrame(columns=['Column', 'Customer', 'fname', 'lname', 'phone', 'make', 'model'])
out_df.Customer = i.A
out_df.fname = i.B
out_df.lname = i.C
out_df.phone = i.D
out_df.Column = first[0]
out_df.make = first[2]
out_df.model = first[4]
out_df.to_csv('new_file.csv')

And, this is where I am stuck with no idea to progress. Is there any other question similar to this where I can find a few ideas on how to get the desired result?

Any help is really very appreciated.

Thank you!

I would use csv.DictReader() and read the rows from the file, check the condition ie, 1st column is 'Customer' or 'Column x' and assign the values to a list. Finally convert the list to a desired data frame.

Input csv file test.csv :

在此处输入图像描述

Code:

rows = []
with open('test.csv', 'rt') as f:
    for row in csv.DictReader(f, fieldnames=['A', 'B', 'C', 'D', 'E']):
        if row['A'].startswith('Column') and row['B'] == 'Car Make' and row['D'] == 'Car Model':
            column = row['A']
            make = row['C']
            model = row['E']
            customer = fname = lname = phone = ''
        elif row['A']!='Customer':
            customer = row['A']
            fname = row['B']
            lname = row['C']
            phone = row['D']

        if fname!='':
            rows.append([column, customer, fname, lname, phone, make, model])

df = pd.DataFrame(rows, columns=['Column', 'Customer', 'fname', 'lname', 'phone', 'Car Make', 'Car Model'])
print(df)

Result:

      Column Customer   fname   lname       phone Car Make Car Model
0   Column 1        1  abcdef  iiiiii   123456454    Mazda    123456
1   Column 1        2  eddfga  jjjjjj   123434325    Mazda    123456
2   Column 1        3  phadfa    kkkk   124141414    Mazda    123456
3   Column 1        4  adfkld  llllll   567575575    Mazda    123456
4   Column 1        5  dafadf    mmmm   898979778    Mazda    123456
5   Column 2        1  gjjgjd   pwldd  7123097109   Nissan    789471
6   Column 2        2   rbnhf   ggggg  9827394829   Nissan    789471
7   Column 2        3   plgrd   hhhhh  9210313813   Nissan    789471
8   Column 2        4   ghurf   jjjjj  9876548311   Nissan    789471
9   Column 2        5   mjydd   kkkkk  8887654625   Nissan    789471
10  Column 2        6   kiure  llllll  7775559994   Nissan    789471
11  Column 2        7   wriok   mmmmm  9993338881   Nissan    789471
12  Column 2        8   stije  nnnnnn  1110002223   Nissan    789471
13  Column 2        9   ahlkg   bbbbb  7778889995   Nissan    789471
14  Column 2       10   cdegf   vvvvv  9993331999   Nissan    789471

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM