读取文本文件，并在第一列中将其分为多个基于唯一代码的文件

Question

Read the text file and split into multiple files based on the unique code present in the first column of text file- Column structure will be different for each record based on the unique code identifier in first column. 读取文本文件，然后根据文本文件第一列中存在的唯一代码将其拆分为多个文件-基于第一列中的唯一代码标识符，每条记录的列结构都会有所不同。

Text file with comma separator 带有逗号分隔符的文本文件

Sample input file structure
"05555", "AB", "CC", "DD", "EE", "USA"
"05555", "AB", "CC", "DD", "EE", "CA"
"05555", "AB", "CC", "DD", "EE", "NY"
"0666666", "AB", "CC", "DD", "EE", "NY", "123", "567", "888"
"0666666", "AB", "CC", "DD", "EE", "USA", "123", "567", "999"

I would like to split the above text file into text file based on the unique code identifier in the first column. 我想根据第一列中的唯一代码标识符将上述文本文件拆分为文本文件。

Expected two file with data as below 预期两个文件的数据如下

File1
"05555", "AB", "CC", "DD", "EE", "USA"
"05555", "AB", "CC", "DD", "EE", "CA"
"05555", "AB", "CC", "DD", "EE", "NY"

file2
"0666666", "AB", "CC", "DD", "EE", "NY", "123", "567", "888"
"0666666", "AB", "CC", "DD", "EE", "USA", "123", "567", "999"

Note: As structure of different for each code identifier, I'm not able to read the data into pandas dataframes 注意：由于每个代码标识符的结构不同，因此我无法将数据读取到pandas数据帧中

Answer 1

You question contain two parts,1st read the file with unbalanced row , then split the dataframe to sub-dfs 您的问题包括两部分：首先读取具有不平衡行的文件，然后将数据帧拆分为sub-dfs

import pandas, io

data = io.StringIO('''"05555", "AB", "CC", "DD", "EE", "USA"
"05555", "AB", "CC", "DD", "EE", "CA"
"05555", "AB", "CC", "DD", "EE", "NY"
"0666666", "AB", "CC", "DD", "EE", "NY", "123", "567", "888"
"0666666", "AB", "CC", "DD", "EE", "USA", "123", "567", "999"
''')
df = pd.read_csv(data,sep=';',header=None)
s=df[0].str.split(',',expand=True)
s=s.apply(lambda x : x.str.strip(' "'),1)
for x , y in s.groupby(0):
    print(y.dropna(1))
    y.dropna(1).to_csv(str(x)+'.csv')
       0   1   2   3   4    5
0  05555  AB  CC  DD  EE  USA
1  05555  AB  CC  DD  EE   CA
2  05555  AB  CC  DD  EE   NY
         0   1   2   3   4    5    6    7    8
3  0666666  AB  CC  DD  EE   NY  123  567  888
4  0666666  AB  CC  DD  EE  USA  123  567  999

Answer 2

Try using groupby and an for loop and then write the csv s: 尝试使用groupby和for循环，然后编写csv ：

for i, (_, group) in enumerate(df.groupby(df.iloc[:, 0]), 1):
    group.to_csv('File%s' % i)

读取文本文件，并在第一列中将其分为多个基于唯一代码的文件

问题描述

2 个解决方案

解决方案1
1 2019-06-19 01:59:58

解决方案2
0 2019-06-19 01:56:15

读取文本文件，并在第一列中将其分为多个基于唯一代码的文件

问题描述

2 个解决方案

解决方案1 1 2019-06-19 01:59:58

解决方案2 0 2019-06-19 01:56:15

解决方案1
1 2019-06-19 01:59:58

解决方案2
0 2019-06-19 01:56:15