简体   繁体   中英

Extract specific data (within brackets) from a column of .txt file?

I have the following text. I added the #of rows for each line, which is not included in the text and it must not be considered.

(line1)The following table of hex bolt head dimensions was adapted from ASME B18.2.1, Table 2, "Dimensions of Hex Bolts."

(line2)
(line3)Size Nominal (Major)
(line4)Diameter [in]            Width Across Flats          Head Height
(line5)        Nominal [in] Minimum [in]    Nominal [in]        Minimum [in]
(line6)1/4" 0.2500          7/16"(0.438)    0.425       11/64"  0.150

I am trying to extract the data from some of the columns but I am having problem extracting from column 2 which includes a float within brackets

From a txt file that contents columns and row of information I tried to organize it on lists. One of the columns has a float within brackets like this "7/16"(0.438) , which is in column 2 and I need to store 0.438 in a list.

I also want to skip the first 5 rows given that those are strings and I just want to start reading from the 6th row

def Main():

    filename = 'BoltSizes.txt' #file name
    f1 = open(filename, 'r')  # open the file for reading
    data = f1.readlines()  # read the entire file as a list of strings
    f1.close()  # close    the file  ... very important

    #creating empty arrays
    Diameter = []
    Width_Max = []
    Width_Min = []
    Head_Height = []

    for line in data: #loop over all the lines
        cells = line.strip().split(",") #creates a list of words

        val = float(cells[1])
        Diameter.append(val)

        #Here I need to get only the float from the brackets 7/16"(0.438)
        val = float(cells[2])
        Width_Max.append(val)

        val = float(cells[3])
        Width_Min.append(val)

        val = float(cells[5])
        Head_Height.append(val)

Main()

I am getting this error:

line 16, in Main
    val = float(cells[1]) ValueError: could not convert string to float: ' Table 2'

Since data is a clasic Python list, you can use list indices to get a parsing range. So, to skip first 5 columns, you should pass data[5:] to the for loop.

Fixing second column is a bit more complicated task; best way to extract data from column #2 would be to use re.search() .

So, you can change your code to something like this:

# we'll use regexp to extract value for col no. 2
import re
# skips first five rows
for line in data[5:]:
   # strips the excesive whitespace and replaces them with single comma
   strip = re.sub("\s+", ",", line)
   cells = strip.split(",") # creates a list of words

   # parsing column 0, 1..
   ...
   # column 2 is critical
   tmp = re.search(r'\((.*?)\)', cells[2])
   # we have to check if re.search() returned something
   if tmp:
      # we're taking group 1, group 0 includes brackets.
      val = tmp.group(1)
      # one more check, val should be numeric value for float to work.
      if val.isnumeric():
         Width_Max.append(float(val))

   # continue your parsing

Problem with this code is that it will probably break first time your data changes, but since you've put only one row I can't provide more detailed help.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM