[英]Extract specific data (within brackets) from a column of .txt file?
I have the following text. 我有以下文字。 I added the #of rows for each line, which is not included in the text and it must not be considered. 我为每行添加了#of行,这些行未包含在文本中,因此不得考虑。
(line1)The following table of hex bolt head dimensions was adapted from ASME B18.2.1, Table 2, "Dimensions of Hex Bolts."
(line2)
(line3)Size Nominal (Major)
(line4)Diameter [in] Width Across Flats Head Height
(line5) Nominal [in] Minimum [in] Nominal [in] Minimum [in]
(line6)1/4" 0.2500 7/16"(0.438) 0.425 11/64" 0.150
I am trying to extract the data from some of the columns but I am having problem extracting from column 2 which includes a float within brackets 我试图从一些列中提取数据,但我从第2列中提取问题,其中包括括号内的浮点数
From a txt file that contents columns and row of information I tried to organize it on lists. 从内容列和信息行的txt文件,我试图在列表上组织它。 One of the columns has a float within brackets like this "7/16"(0.438)
, which is in column 2 and I need to store 0.438 in a list. 其中一列在括号内有一个浮点数,如"7/16"(0.438)
,在第2列中,我需要在列表中存储0.438。
I also want to skip the first 5 rows given that those are strings and I just want to start reading from the 6th row 我还想跳过前5行,因为这些是字符串,我只想从第6行开始阅读
def Main():
filename = 'BoltSizes.txt' #file name
f1 = open(filename, 'r') # open the file for reading
data = f1.readlines() # read the entire file as a list of strings
f1.close() # close the file ... very important
#creating empty arrays
Diameter = []
Width_Max = []
Width_Min = []
Head_Height = []
for line in data: #loop over all the lines
cells = line.strip().split(",") #creates a list of words
val = float(cells[1])
Diameter.append(val)
#Here I need to get only the float from the brackets 7/16"(0.438)
val = float(cells[2])
Width_Max.append(val)
val = float(cells[3])
Width_Min.append(val)
val = float(cells[5])
Head_Height.append(val)
Main()
I am getting this error: 我收到此错误:
line 16, in Main
val = float(cells[1]) ValueError: could not convert string to float: ' Table 2'
Since data
is a clasic Python list, you can use list indices to get a parsing range. 由于data
是一个clasic Python列表,您可以使用列表索引来获取解析范围。 So, to skip first 5 columns, you should pass data[5:]
to the for
loop. 因此,要跳过前5列,您应该将data[5:]
传递给for
循环。
Fixing second column is a bit more complicated task; 修复第二列是一项更复杂的任务; best way to extract data from column #2 would be to use re.search()
. 从第2列中提取数据的最佳方法是使用re.search()
。
So, you can change your code to something like this: 因此,您可以将代码更改为以下内容:
# we'll use regexp to extract value for col no. 2
import re
# skips first five rows
for line in data[5:]:
# strips the excesive whitespace and replaces them with single comma
strip = re.sub("\s+", ",", line)
cells = strip.split(",") # creates a list of words
# parsing column 0, 1..
...
# column 2 is critical
tmp = re.search(r'\((.*?)\)', cells[2])
# we have to check if re.search() returned something
if tmp:
# we're taking group 1, group 0 includes brackets.
val = tmp.group(1)
# one more check, val should be numeric value for float to work.
if val.isnumeric():
Width_Max.append(float(val))
# continue your parsing
Problem with this code is that it will probably break first time your data changes, but since you've put only one row I can't provide more detailed help. 这段代码的问题在于它可能会在第一次数据更改时中断,但由于您只放了一行,因此我无法提供更详细的帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.