Read the text file and split into multiple files based unique code present in the first column

Question

Read the text file and split into multiple files based on the unique code present in the first column of text file- Column structure will be different for each record based on the unique code identifier in first column.

Text file with comma separator

Sample input file structure
"05555", "AB", "CC", "DD", "EE", "USA"
"05555", "AB", "CC", "DD", "EE", "CA"
"05555", "AB", "CC", "DD", "EE", "NY"
"0666666", "AB", "CC", "DD", "EE", "NY", "123", "567", "888"
"0666666", "AB", "CC", "DD", "EE", "USA", "123", "567", "999"

I would like to split the above text file into text file based on the unique code identifier in the first column.

Expected two file with data as below

File1
"05555", "AB", "CC", "DD", "EE", "USA"
"05555", "AB", "CC", "DD", "EE", "CA"
"05555", "AB", "CC", "DD", "EE", "NY"

file2
"0666666", "AB", "CC", "DD", "EE", "NY", "123", "567", "888"
"0666666", "AB", "CC", "DD", "EE", "USA", "123", "567", "999"

Note: As structure of different for each code identifier, I'm not able to read the data into pandas dataframes

Answer 1

You question contain two parts,1st read the file with unbalanced row , then split the dataframe to sub-dfs

import pandas, io

data = io.StringIO('''"05555", "AB", "CC", "DD", "EE", "USA"
"05555", "AB", "CC", "DD", "EE", "CA"
"05555", "AB", "CC", "DD", "EE", "NY"
"0666666", "AB", "CC", "DD", "EE", "NY", "123", "567", "888"
"0666666", "AB", "CC", "DD", "EE", "USA", "123", "567", "999"
''')
df = pd.read_csv(data,sep=';',header=None)
s=df[0].str.split(',',expand=True)
s=s.apply(lambda x : x.str.strip(' "'),1)
for x , y in s.groupby(0):
    print(y.dropna(1))
    y.dropna(1).to_csv(str(x)+'.csv')
       0   1   2   3   4    5
0  05555  AB  CC  DD  EE  USA
1  05555  AB  CC  DD  EE   CA
2  05555  AB  CC  DD  EE   NY
         0   1   2   3   4    5    6    7    8
3  0666666  AB  CC  DD  EE   NY  123  567  888
4  0666666  AB  CC  DD  EE  USA  123  567  999

Answer 2

Try using groupby and an for loop and then write the csv s:

for i, (_, group) in enumerate(df.groupby(df.iloc[:, 0]), 1):
    group.to_csv('File%s' % i)

Read the text file and split into multiple files based unique code present in the first column

Question

2 answers

solution1
1 2019-06-19 01:59:58

solution2
0 2019-06-19 01:56:15

Read the text file and split into multiple files based unique code present in the first column

Question

2 answers

solution1 1 2019-06-19 01:59:58

solution2 0 2019-06-19 01:56:15

solution1
1 2019-06-19 01:59:58

solution2
0 2019-06-19 01:56:15