简体   繁体   English

我有一个包含许多.tar.gz文件的文件夹。 在python中,我想进入每个文件解压缩或压缩,并找到具有要提取的字符串的文本文件?

[英]I have a folder with many .tar.gz files. In python I want to go into each file unzip or compress and find text file that has string I want to extract?

I have main folder with many gz.tar compress files. 我的主文件夹包含许多gz.tar压缩文件。 So I need to unzip twice to get to a data file with text then I am extracting a certain string in the text. 因此,我需要解压缩两次以获取带有文本的数据文件,然后在文本中提取特定的字符串。 I am having trouble unzipping to get to the file with text then move to next file and do the same. 我无法解压缩以获取包含文本的文件,然后移至下一个文件并执行相同操作。 Saving the results in a dataframe. 将结果保存在数据框中。

import os
import tarfile
for i in os.listdir(r'\user\project gz'):
 tar = (i, "r:gz")
 for m in tar.getmembers():
  f= tar.extractfile(member):
  if f is not None:
   content = f.read()
   text = re.findall(r"\name\s", content)
   df = pd.Dataframe(text)
   print(df)

I guess you want to find out file which contains the string \\name\\s in \\user\\project gz\\*.tar.gz ? 我猜您想在\\user\\project gz\\*.tar.gz找出包含字符串\\name\\s文件吗?

A solution is 一个解决方案是

import os
import re
import tarfile

import pandas as pd

row = []
value = []


for filename in os.listdir(r'\\user\\project gz'):
    if filename.endswith('.tar.gz'):
        tar = tarfile.open(r'\\user\\project gz' + filename)
        for text_file in tar.getmembers():
            f = tar.extractfile(text_file)
            if f is not None:
                content = f.read().decode()
                if re.findall(r"\\name\\s", content):
                    row.append(text_file.name)
                    value.append(content)
        tar.close()


df = pd.DataFrame(value, columns=['nametag'], index=row)
print(df)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我想通过(Python)为解压缩(.tar.gz)文件创建一个脚本 - I want to create a script for unzip (.tar.gz) file via (Python) 尝试解压缩 tar.gz 文件并将其保存为文件夹中的多个可读文本文件,然后重新压缩它 - Trying to unzip tar.gz file and save it as multiple readable text files in a folder and then recompress it 如何仅提取.tar.gz成员的文件? - How do I extract only the file of a .tar.gz member? 我想提取.tgz文件并提取任何具有.tgz和.tar文件的子目录 - I want to extract a .tgz file and extract any subdirectories that have files that are .tgz and .tar 我无法使用 Python pip 下载 tar.gz 和 .zip 文件 - I cannot download tar.gz and .zip files with Python pip 使用python将目录压缩到内存中的tar.gz文件中 - Compress directory into tar.gz file in memory with python 如何将单个文件压缩为 tar.gz - how to compress single file to tar.gz 我想用Python解压zip文件,保存在各个文件夹中。? - I want to use Python to unzip the zip file and save it in each folder.? 我有一个包含一系列问题的列表。 我希望将这些问题中的每一个都将 go 放入单独的.txt 文件中。 我怎么做? - I have a list with a set of questions. I want each of these questions to go into separate .txt files. How do I do that? Python 创建的 tar.gz 文件包含“_”文件夹,如何删除? - Python created tar.gz file contains “_” folder, how to remove?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM