简体   繁体   中英

Read the text file and split into multiple files based unique code present in the first column

Read the text file and split into multiple files based on the unique code present in the first column of text file- Column structure will be different for each record based on the unique code identifier in first column.

Text file with comma separator

Sample input file structure
"05555", "AB", "CC", "DD", "EE", "USA"
"05555", "AB", "CC", "DD", "EE", "CA"
"05555", "AB", "CC", "DD", "EE", "NY"
"0666666", "AB", "CC", "DD", "EE", "NY", "123", "567", "888"
"0666666", "AB", "CC", "DD", "EE", "USA", "123", "567", "999"

I would like to split the above text file into text file based on the unique code identifier in the first column.

Expected two file with data as below

File1
"05555", "AB", "CC", "DD", "EE", "USA"
"05555", "AB", "CC", "DD", "EE", "CA"
"05555", "AB", "CC", "DD", "EE", "NY"

file2
"0666666", "AB", "CC", "DD", "EE", "NY", "123", "567", "888"
"0666666", "AB", "CC", "DD", "EE", "USA", "123", "567", "999"

Note: As structure of different for each code identifier, I'm not able to read the data into pandas dataframes

You question contain two parts,1st read the file with unbalanced row , then split the dataframe to sub-dfs

import pandas, io

data = io.StringIO('''"05555", "AB", "CC", "DD", "EE", "USA"
"05555", "AB", "CC", "DD", "EE", "CA"
"05555", "AB", "CC", "DD", "EE", "NY"
"0666666", "AB", "CC", "DD", "EE", "NY", "123", "567", "888"
"0666666", "AB", "CC", "DD", "EE", "USA", "123", "567", "999"
''')
df = pd.read_csv(data,sep=';',header=None)
s=df[0].str.split(',',expand=True)
s=s.apply(lambda x : x.str.strip(' "'),1)
for x , y in s.groupby(0):
    print(y.dropna(1))
    y.dropna(1).to_csv(str(x)+'.csv')
       0   1   2   3   4    5
0  05555  AB  CC  DD  EE  USA
1  05555  AB  CC  DD  EE   CA
2  05555  AB  CC  DD  EE   NY
         0   1   2   3   4    5    6    7    8
3  0666666  AB  CC  DD  EE   NY  123  567  888
4  0666666  AB  CC  DD  EE  USA  123  567  999

Try using groupby and an for loop and then write the csv s:

for i, (_, group) in enumerate(df.groupby(df.iloc[:, 0]), 1):
    group.to_csv('File%s' % i)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM