[英]How to extract data from a text file in Python?
I have a text file which contains a lot of information.我有一个包含大量信息的文本文件。 I tried to seek some help regarding this.我试图就此寻求一些帮助。 I found some things which were a bit similar but not exactly what I wanted to do.我发现了一些有点相似但不完全是我想要做的事情。
I have text file (shown below) out of which I want to extract the data of first 3 columns into an array.我有文本文件(如下所示),我想从中提取前 3 列的数据到一个数组中。
I am a beginner in Python.我是 Python 的初学者。 Please help to resolve this.请帮助解决这个问题。
//Text file starts
---------------------------SOFTWARE NAME------------------------------------
I/O Filenames: abc.txt
Variables:______
------------------------------------------------------------------------
Method name.
Coordinates : 0 0
S.No. X(No.) Y(No.) Z(Beta) A(Alpha)
1 3.541 0
2 7.821 180
3 2.160 0
4 4.143 0 3.69 0
5 2.186 0 2.18 0
6 3.490 0 2.45 0
//End of text file
For this I would use the package csv
为此,我将使用包csv
import csv #Import the package
with open('/path/to/file.csv') as csvDataFile: #open the csv file
csvReader = csv.reader(csvDataFile,delimiter=';') #load the csv file with the delimiter of your choice, here it is a ;
for row in csvReader:
#do something with the row
I advise you to format you file better, a good one would be:我建议你更好地格式化你的文件,一个好的是:
S.No.;X(No.);Y(No.);Z(Beta);A(Alpha)
1;3.541;0;;
2;7.821;180;;
3;2.160;0;;
4;4.143;0;3.69;0
5;2.186;0;2.18;0
6;3.490;0;2.45;0
You could try to extract the data with re
module ( regex101 ):您可以尝试使用re
模块( regex101 )提取数据:
import re
from itertools import zip_longest
data = '''
//Text file starts
---------------------------SOFTWARE NAME------------------------------------
I/O Filenames: abc.txt
Variables:______
------------------------------------------------------------------------
Method name.
Coordinates : 0 0
S.No. X(No.) Y(No.) Z(Beta) A(Alpha)
1 3.541 0
2 7.821 180
3 2.160 0
4 4.143 0 3.69 0
5 2.186 0 2.18 0
6 3.490 0 2.45 0
//End of text file
'''
l = [g.split() for g in re.findall(r'^\s+\d+\s+[^\n]+$', data, flags=re.M)]
for v in zip(*zip_longest(*l)):
print(v)
Prints:印刷:
('1', '3.541', '0', None, None)
('2', '7.821', '180', None, None)
('3', '2.160', '0', None, None)
('4', '4.143', '0', '3.69', '0')
('5', '2.186', '0', '2.18', '0')
('6', '3.490', '0', '2.45', '0')
Use re
module to extract text.使用re
模块提取文本。 Use numpy
module to construct the "array" you need.使用numpy
模块构建您需要的“数组”。
import re
import numpy as np
text = """
//Text file starts
---------------------------SOFTWARE NAME------------------------------------
I/O Filenames: abc.txt
Variables:______
------------------------------------------------------------------------
Method name.
Coordinates : 0 0
S.No. X(No.) Y(No.) Z(Beta) A(Alpha)
1 3.541 0
2 7.821 180
3 2.160 0
4 4.143 0 3.69 0
5 2.186 0 2.18 0
6 3.490 0 2.45 0
//End of text file
"""
regex = r'(?<=^\s{5})\d\s*[\d\.]*\s*\d*'
matches = [x.split() for x in re.findall(regex, text, flags=re.MULTILINE)]
arr = np.array(matches)
print(arr)
It provides the output:它提供输出:
[['1' '3.541' '0']
['2' '7.821' '180']
['3' '2.160' '0']
['4' '4.143' '0']
['5' '2.186' '0']
['6' '3.490' '0']]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.