简体   繁体   English

需要帮助格式化 .txt 文件并放入数据框中

[英]Need help formatting a .txt file and placing into a data frame

I have a .txt file with the following format:我有一个 .txt 文件,格式如下:

CIK|Company Name|Form Type|Date Filed|Filename
--------------------------------------------------------------------------------
1000032|BINCH JAMES G|4|2016-11-07|edgar/data/1000032/0001209191-16-148633.txt
1000032|BINCH JAMES G|4|2016-12-02|edgar/data/1000032/0001209191-16-153119.txt
1000045|NICHOLAS FINANCIAL INC|10-Q|2016-11-09|edgar/data/1000045/0001193125-16-763849.txt
1000045|NICHOLAS FINANCIAL INC|4|2016-10-04|edgar/data/1000045/0001000045-16-000006.txt

What I'd like to do is import this information then insert it into a dataframe, with each section after a '|'我想要做的是导入这些信息,然后将其插入到数据框中,每个部分都在“|”之后in a new column, and each new line a new entry.在一个新列中,每个新行都有一个新条目。 I have experience with importing .csv and well-formatted files into dataframes but have never dealt with something this messy.我有将 .csv 和格式良好的文件导入数据帧的经验,但从未处理过如此凌乱的事情。 If you'd like the .txt file to play around with, let me know.如果您希望使用 .txt 文件,请告诉我。

Thanks for the help in advance.我在这里先向您的帮助表示感谢。

Assuming you have the following text file:假设您有以下文本文件:

CIK|Company Name|Form Type|Date Filed|Filename
--------------------------------------------------------------------------------
1000032|BINCH JAMES G|4|2016-11-07|edgar/data/1000032/0001209191-16-148633.txt
1000032|BINCH JAMES G|4|2016-12-02|edgar/data/1000032/0001209191-16-153119.txt
1000045|NICHOLAS FINANCIAL INC|10-Q|2016-11-09|edgar/data/1000045/0001193125-16-763849.txt
1000045|NICHOLAS FINANCIAL INC|4|2016-10-04|edgar/data/1000045/0001000045-16-000006.txt

Solution:解决方案:

df = pd.read_csv(filename, sep='|', skiprows=[1], parse_dates=['Date Filed'])

Result:结果:

In [94]: df
Out[94]:
       CIK            Company Name Form Type Date Filed                                     Filename
0  1000032           BINCH JAMES G         4 2016-11-07  edgar/data/1000032/0001209191-16-148633.txt
1  1000032           BINCH JAMES G         4 2016-12-02  edgar/data/1000032/0001209191-16-153119.txt
2  1000045  NICHOLAS FINANCIAL INC      10-Q 2016-11-09  edgar/data/1000045/0001193125-16-763849.txt
3  1000045  NICHOLAS FINANCIAL INC         4 2016-10-04  edgar/data/1000045/0001000045-16-000006.txt

In [95]: df.dtypes
Out[95]:
CIK                      int64
Company Name            object
Form Type               object
Date Filed      datetime64[ns]
Filename                object
dtype: object

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM