简体   繁体   English

读取文本文件,并在第一列中将其分为多个基于唯一代码的文件

[英]Read the text file and split into multiple files based unique code present in the first column

Read the text file and split into multiple files based on the unique code present in the first column of text file- Column structure will be different for each record based on the unique code identifier in first column. 读取文本文件,然后根据文本文件第一列中存在的唯一代码将其拆分为多个文件-基于第一列中的唯一代码标识符,每条记录的列结构都会有所不同。

Text file with comma separator 带有逗号分隔符的文本文件

Sample input file structure
"05555", "AB", "CC", "DD", "EE", "USA"
"05555", "AB", "CC", "DD", "EE", "CA"
"05555", "AB", "CC", "DD", "EE", "NY"
"0666666", "AB", "CC", "DD", "EE", "NY", "123", "567", "888"
"0666666", "AB", "CC", "DD", "EE", "USA", "123", "567", "999"

I would like to split the above text file into text file based on the unique code identifier in the first column. 我想根据第一列中的唯一代码标识符将上述文本文件拆分为文本文件。

Expected two file with data as below 预期两个文件的数据如下

File1
"05555", "AB", "CC", "DD", "EE", "USA"
"05555", "AB", "CC", "DD", "EE", "CA"
"05555", "AB", "CC", "DD", "EE", "NY"

file2
"0666666", "AB", "CC", "DD", "EE", "NY", "123", "567", "888"
"0666666", "AB", "CC", "DD", "EE", "USA", "123", "567", "999"

Note: As structure of different for each code identifier, I'm not able to read the data into pandas dataframes 注意:由于每个代码标识符的结构不同,因此我无法将数据读取到pandas数据帧中

You question contain two parts,1st read the file with unbalanced row , then split the dataframe to sub-dfs 您的问题包括两部分:首先读取具有不平衡行的文件,然后将数据帧拆分为sub-dfs

import pandas, io

data = io.StringIO('''"05555", "AB", "CC", "DD", "EE", "USA"
"05555", "AB", "CC", "DD", "EE", "CA"
"05555", "AB", "CC", "DD", "EE", "NY"
"0666666", "AB", "CC", "DD", "EE", "NY", "123", "567", "888"
"0666666", "AB", "CC", "DD", "EE", "USA", "123", "567", "999"
''')
df = pd.read_csv(data,sep=';',header=None)
s=df[0].str.split(',',expand=True)
s=s.apply(lambda x : x.str.strip(' "'),1)
for x , y in s.groupby(0):
    print(y.dropna(1))
    y.dropna(1).to_csv(str(x)+'.csv')
       0   1   2   3   4    5
0  05555  AB  CC  DD  EE  USA
1  05555  AB  CC  DD  EE   CA
2  05555  AB  CC  DD  EE   NY
         0   1   2   3   4    5    6    7    8
3  0666666  AB  CC  DD  EE   NY  123  567  888
4  0666666  AB  CC  DD  EE  USA  123  567  999

Try using groupby and an for loop and then write the csv s: 尝试使用groupbyfor循环,然后编写csv

for i, (_, group) in enumerate(df.groupby(df.iloc[:, 0]), 1):
    group.to_csv('File%s' % i)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 Python 根据第一列将 xlsx 文件拆分为其他 xlsx 文件 - Split xlsx file into other xlsx files based on first column with Python 根据特定条件将一个文本文件拆分为多个文本文件 - Split a text file into multiple text files based on a certain criteria 读取两个文件并根据第一个文件的列过滤第二个文件 - read two files and filter second file based on a column of first file 我想按列将文本文件拆分为多个文件 - I want split a text file into multiple files by a column 如何阅读文本文件,然后使用python将其拆分为多个文本文件? - How do you read a text file, then split that text file into multiple text files with python? 根据列值将CSV拆分成多个文件 - Split CSV into multiple files based on column value 根据二维码将PDF拆分成多个文件 - Split PDF into multiple files based on QR code 根据唯一的列组合将数据框拆分为多个数据框 - Split data frame into multiple data frames based on unique column combinations 无法根据第一列值将列拆分为多列 - Unable to split the column into multiple columns based on the first column value Python脚本可解析文本,基于第一列中的值创建多个文件以及基于匹配项进行排序 - Python script to parse text, create multiple files based on values in first column, and sort based on matching
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM