简体   繁体   English

Python - 拆分整个文本文件

[英]Python - Splitting up a whole text file

I have a .txt file which contains data that I need to extract into lists. 我有一个.txt文件,其中包含我需要提取到列表中的数据。 a typical line looks like: 典型的线条如下:

  Sfc. W.Dir        -       -     242     240     237     241     246     248     246     249     253     254     257     266     262     269     284     283     283     290     291     295     292     287     290     293     291  Sfc. W.Dir 

The whole file looks like this . 整个文件看起来像这样

How would I go about splitting this up into a text file which just contains the numbers? 我如何将其拆分为仅包含数字的文本文件? ie the line for the example would just be 242 240 237 241 etc. The first few lines don't contain data so I only need lines 8-23. 即该示例的行仅为242 240 237 241等。前几行不包含数据,因此我只需要第8-23行。

This would do. 这样做。

fp = open("file")
for i, line in enumerate(fp):
    if i>=8 and i<=23:
         print [int(s) for s in line.split() if s.isdigit()]
fp.close()

If you want to do this only as part of text processing for your further use then I'd recommend you to use file processing tools like awk or sed : 如果您只想将其作为文本处理的一部分以供进一步使用,那么我建议您使用awksed等文件处理工具:

$ awk 'FNR>=8 && FNR<=23 {gsub(/[^0-9 ]/,""); $1=$1; print $0}' file

0 0 0 0 30 82 116 195 217 231 241 248 251 249 244 234 220 200 171 129 31 0 0 0 0
1773 1806 1801 1795 558 1147 1258 1589 1711 1747 1796 1839 1865 1872 1863 1828 1780 1709 1582 1413 1023 1831 1507 1327 1199
356 356 356 356 356 356 356 356 356 699 820 887 915 907 866 760 356 356 356 356 356 356 356 356 356
175 356 356 356 356 356 356 356 815 987 1060 1121 1166 1188 1187 1166 1113 1030 891 356 356 356 356 356 356 356 175
99900 99900 99900 3486 4258 2745 4503 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 99900 99900
2 1436 1428 1416 1414 1427 1454 1488 1583 1625 1652 1679 1699 1715 1725 1724 1719 1708 1685 1660 1629 1585 1491 1369 1326 1299 2
2 1397 1394 1379 1358 1346 1338 1315 1296 1266 1249 1234 1218 1203 1193 1187 1186 1190 1199 1206 1211 1223 1238 1232 1226 1219 2
10207 10207 10209 10209 10209 10210 10210 10210 10209 10209 10209 10209 10208 10205 10202 10200 10198 10196 10194 10193 10193 10192 10192 10191 10192
198 200 201 203 202 201 199 197 197 196 196 195 195 194 193 193 193 193 193 192 191 187 184 184 183
13 13 13 13 12 13 13 14 14 14 14 14 15 15 15 15 15 14 14 14 12 11 12 12 11
19 19 19 19 12 15 15 15 16 16 16 16 16 16 16 16 16 16 16 15 14 17 17 16 16
206 207 208 209 202 202 200 199 200 199 199 199 198 198 198 197 197 197 196 196 194 202 200 199 199
2 2 2 2 2 3 2 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2
1249 1303 1287 1199 134 317 204 95 67 209 327 433 536 611 648 671 659 615 622 670 864 322 437 448 430
1 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 1
000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000
value_lists = []

file_ = <put name here>
start_line = <line start>
end_line = <line end>

with open(file_) as data:
    for i, v in enumerate(data.readlines()):
        if start_line <= i <= end_line:
            value_lists += [[]]
            for val in v.split():
                try:
                    value_lists[i-start_line].append(float(val))
                except ValueError:
                     pass

Seems to have returned correct result after running the code on your data-set. 在您的数据集上运行代码后,似乎已返回正确的结果。 n 'th index, with relativity to the file, is the same as the n-7 'th index on the list. N“个索引,相对论到文件中,相同的N-7”个列表上的索引。

Note: tested on Python 2.7.14 and Python 3.6.2, returns same result 注意:在Python 2.7.14和Python 3.6.2上测试,返回相同的结果

Note: if the code returns extra lists, remove 1 from the line start & end variables. 注意:如果代码返回额外列表,请从行开始和结束变量中删除1。 Should usually fix it 通常应该修复它

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM