Read the text file and split into multiple files based on the unique code present in the first column of text file- Column structure will be different for each record based on the unique code identifier in first column.
Text file with comma separator
Sample input file structure
"05555", "AB", "CC", "DD", "EE", "USA"
"05555", "AB", "CC", "DD", "EE", "CA"
"05555", "AB", "CC", "DD", "EE", "NY"
"0666666", "AB", "CC", "DD", "EE", "NY", "123", "567", "888"
"0666666", "AB", "CC", "DD", "EE", "USA", "123", "567", "999"
I would like to split the above text file into text file based on the unique code identifier in the first column.
Expected two file with data as below
File1
"05555", "AB", "CC", "DD", "EE", "USA"
"05555", "AB", "CC", "DD", "EE", "CA"
"05555", "AB", "CC", "DD", "EE", "NY"
file2
"0666666", "AB", "CC", "DD", "EE", "NY", "123", "567", "888"
"0666666", "AB", "CC", "DD", "EE", "USA", "123", "567", "999"
Note: As structure of different for each code identifier, I'm not able to read the data into pandas dataframes
You question contain two parts,1st read the file with unbalanced row , then split the dataframe to sub-dfs
import pandas, io
data = io.StringIO('''"05555", "AB", "CC", "DD", "EE", "USA"
"05555", "AB", "CC", "DD", "EE", "CA"
"05555", "AB", "CC", "DD", "EE", "NY"
"0666666", "AB", "CC", "DD", "EE", "NY", "123", "567", "888"
"0666666", "AB", "CC", "DD", "EE", "USA", "123", "567", "999"
''')
df = pd.read_csv(data,sep=';',header=None)
s=df[0].str.split(',',expand=True)
s=s.apply(lambda x : x.str.strip(' "'),1)
for x , y in s.groupby(0):
print(y.dropna(1))
y.dropna(1).to_csv(str(x)+'.csv')
0 1 2 3 4 5
0 05555 AB CC DD EE USA
1 05555 AB CC DD EE CA
2 05555 AB CC DD EE NY
0 1 2 3 4 5 6 7 8
3 0666666 AB CC DD EE NY 123 567 888
4 0666666 AB CC DD EE USA 123 567 999
Try using groupby
and an for
loop and then write the csv
s:
for i, (_, group) in enumerate(df.groupby(df.iloc[:, 0]), 1):
group.to_csv('File%s' % i)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.