简体   繁体   中英

I have a folder with many .tar.gz files. In python I want to go into each file unzip or compress and find text file that has string I want to extract?

I have main folder with many gz.tar compress files. So I need to unzip twice to get to a data file with text then I am extracting a certain string in the text. I am having trouble unzipping to get to the file with text then move to next file and do the same. Saving the results in a dataframe.

import os
import tarfile
for i in os.listdir(r'\user\project gz'):
 tar = (i, "r:gz")
 for m in tar.getmembers():
  f= tar.extractfile(member):
  if f is not None:
   content = f.read()
   text = re.findall(r"\name\s", content)
   df = pd.Dataframe(text)
   print(df)

I guess you want to find out file which contains the string \\name\\s in \\user\\project gz\\*.tar.gz ?

A solution is

import os
import re
import tarfile

import pandas as pd

row = []
value = []


for filename in os.listdir(r'\\user\\project gz'):
    if filename.endswith('.tar.gz'):
        tar = tarfile.open(r'\\user\\project gz' + filename)
        for text_file in tar.getmembers():
            f = tar.extractfile(text_file)
            if f is not None:
                content = f.read().decode()
                if re.findall(r"\\name\\s", content):
                    row.append(text_file.name)
                    value.append(content)
        tar.close()


df = pd.DataFrame(value, columns=['nametag'], index=row)
print(df)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM