简体   繁体   中英

Splitting a csv into multiple csv's depending on what is in column 1 using python

so I currently have a large csv containing data for a number of events.

Column one contains a number of dates as well as some id's for each event for example.

在此处输入图片说明

Basically I want to write something within Python that whenever there is an id number (AL.....) it creates a new csv with the id number as the title with all the data in it before the next id number so i end up with a csv for each event.

For info the whole csv contains 8 columns but the division into individual csvs is only predicated on column one

Use Python to split a CSV file with multiple headers

I notice this questions is quite similar but in my case II have AL and then a different string of numbers after it each time and also I want to call the new csvs by the id numbers.

You can achieve this using pandas , so let's first generate some data:


import pandas as pd
import numpy as np

def date_string():
    return str(np.random.randint(1, 32)) + "/" + str(np.random.randint(1, 13)) + "/1997"

l = [date_string() for x in range(20)]
l[0] = "AL123"
l[10] = "AL321"
df = pd.DataFrame(l, columns=['idx'])

# -->
|    | idx        |
|---:|:-----------|
|  0 | AL123      |
|  1 | 24/3/1997  |
|  2 | 8/6/1997   |
|  3 | 6/9/1997   |
|  4 | 31/12/1997 |
|  5 | 11/6/1997  |
|  6 | 2/3/1997   |
|  7 | 31/8/1997  |
|  8 | 21/5/1997  |
|  9 | 30/1/1997  |
| 10 | AL321      |
| 11 | 8/4/1997   |
| 12 | 21/7/1997  |
| 13 | 9/10/1997  |
| 14 | 31/12/1997 |
| 15 | 15/2/1997  |
| 16 | 21/2/1997  |
| 17 | 3/3/1997   |
| 18 | 16/12/1997 |
| 19 | 16/2/1997  |

So, interesting positions are 0 and 10 as there are the AL* strings... Now to filter the AL* you can use:

idx = df.index[df['idx'].str.startswith('AL')] # get's you all index where AL is
dfs = np.split(df, idx) # splits the data
for out in dfs[1:]:
    name = out.iloc[0, 0]
    out.to_csv(name + ".csv", index=False, header=False) # saves the data

This gives you two csv files named AL123.csv and AL321.csv with the first line being the AL* string.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM