简体   繁体   English

在打开文件函数的一段长字符串中切片

[英]Slice within a slice of long string from open file function

Hi everyone I am new to StackOverflow,大家好,我是 StackOverflow 的新手,

I am trying to make a slice of every line in a .dat file that I have been given.我正在尝试对给定的 .dat 文件中的每一行进行切片。

The purpose is for an event study where I am supposed to open the file and then manipulate the data when the file is opened using '.readlines' it is a giant string with numbers.目的是进行事件研究,我应该打开文件,然后在使用“.readlines”打开文件时操作数据,这是一个带有数字的巨大字符串。 There are when printing the usual '\\n' to indicate a new line.打印通常的 '\\n' 以指示新行。

What I have been given is the character length of integer values for each column name to eventually create a dataframe.我得到的是每个列名的整数值的字符长度,以最终创建一个数据框。 This is what I am trying to slice over.这就是我想要切片的东西。

In total there are 73 characters for one line.一行总共有 73 个字符。 From this line 20 characters at the start are for the adjusted close price of shares then the next 17 charactes are the share high price etc etc. I am trying to get this slice of 20 characters and then 17 character that are after the 20 and then so on.从这一行开始,20 个字符是调整后的股票收盘价,然后接下来的 17 个字符是股票高价等。我试图获得这部分 20 个字符,然后是 20 个字符之后的 17 个字符,然后很快。

I feel that the first step is to convert the file into into a list which I have done through '.readlines' (still unsure if this is the right way) and then iterate through each element in the list slicing.我觉得第一步是将文件转换为我通过 '.readlines' 完成的列表(仍然不确定这是否是正确的方法),然后遍历列表切片中的每个元素。

The file through '.readlines' looks like this:通过 '.readlines' 的文件如下所示:

'00041.1501808166503954.22999954223633053.61999893188476600000072014-08-14\\n', '0040.92996978759765654.590000152587890054.3400001525878900000102014-08-15\\n', '0041.24130249023437554.520000457763670054.3899993896484400000072014-08-18\\n', '00041.1501808166503954.22999954223633053.61999893188476600000072014-08-14 \\ n', '0040.92996978759765654.590000152587890054.3400001525878900000102014-08-15 \\ n', '0041.24130249023437554.520000457763670054.3899993896484400000072014-08-18 \\ N',

What I am trying to get is a seperate list for the first 20 characters for each new line.我想要的是每个新行的前 20 个字符的单独列表。 So for the above it will be something like this:因此,对于上述内容,它将是这样的:

list = [00041.15018081665039, 0040.929969787597656, 0041.241302490234375 .....]列表 = [00041.15018081665039, 0040.929969787597656, 0041.241302490234375 .....]

It is not meant to be complicated code either, but any suggestions are really appreciated!它也不是复杂的代码,但任何建议都非常感谢!

Thanks a lot非常感谢

Basic list comprehension should do the trick:基本的列表理解应该可以解决问题:

Try this code:试试这个代码:

lines = ['00041.1501808166503954.22999954223633053.61999893188476600000072014-08-14\n', 
         '0040.92996978759765654.590000152587890054.3400001525878900000102014-08-15\n', 
         '0041.24130249023437554.520000457763670054.3899993896484400000072014-08-18\n']
         
blocks = [ln[:20] for ln in lines]

print(blocks)

Output输出

['00041.15018081665039', '0040.929969787597656', '0041.241302490234375']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM