在golang中提取tarball时丢失文件

Question

I'm trying out this function to just untar a file after I've ungzip'd it, however, when it untars there are some folders missing and I can't figure out why.我正在尝试使用此 function 来解压缩文件，然后解压缩文件，但是，当它解压缩时，缺少一些文件夹，我不知道为什么。 UnGzip works fine when I open the created tarfile via gui so that function isnt included.当我通过 gui 打开创建的 tarfile 时，UnGzip 工作正常，因此不包括 function。

func main() {
fileUrl := "https://www.clamav.net/downloads/production/clamav-0.103.1.tar.gz"
filePath := "clamav-0.103.1.tar.gz"
tempFolder := "temp"
    
err := os.Mkdir(tempFolder, 0755)
if err != nil {
    panic(err)
}

err = DownloadFile(filePath, fileUrl)
if err != nil {
    panic(err)
}
fmt.Println("Downloaded: " + fileUrl)

UnGzip(filePath,tempFolder + "/clamav.tar")
UnTar(tempFolder + "/clamav.tar",tempFolder + "/clamAV/")
//err := os.RemoveAll("tempFolder")
//if err != nil {
    //panic(err)
//}

} }

func UnTar(tarball, target string) error {
reader, err := os.Open(tarball)
if err != nil {
    return err
}
defer reader.Close()
tarReader := tar.NewReader(reader)

for {
    header, err := tarReader.Next()
    if err == io.EOF {
        break
    } else if err != nil {
        return err
    }

    path := filepath.Join(target, header.Name)
    info := header.FileInfo()
    if info.IsDir() {
        if err = os.MkdirAll(path, info.Mode()); err != nil {
            return err
        }
        continue
    }

    file, err := os.OpenFile(path, os.O_CREATE|os.O_TRUNC|os.O_WRONLY, info.Mode())
    if err != nil {
        return err
    }
    defer file.Close()
    _, err = io.Copy(file, tarReader)
    if err != nil {
        return err
    }
}
return nil

} }

here's what I should get: want and here's what I have: have这就是我应该得到的：想要，这就是我所拥有的：有

Answer 1

Here is some example code:这是一些示例代码：

package main

import (
   "archive/tar"
   "compress/gzip"
   "io"
   "os"
   "path"
)

func extract(source string) error {
   file, err := os.Open(source)
   if err != nil { return err }
   defer file.Close()
   gzRead, err := gzip.NewReader(file)
   if err != nil { return err }
   defer gzRead.Close()
   tarRead := tar.NewReader(gzRead)
   for {
      cur, err := tarRead.Next()
      if err == io.EOF { break } else if err != nil { return err }
      os.MkdirAll(path.Dir(cur.Name), os.ModeDir)
      switch cur.Typeflag {
      case tar.TypeReg:
         create, err := os.Create(cur.Name)
         if err != nil { return err }
         defer create.Close()
         create.ReadFrom(tarRead)
      case tar.TypeLink:
         os.Link(cur.Linkname, cur.Name)
      }
   }
   return nil
}

Usage:用法：

package main

func main() {
   extract("clamav-0.103.1.tar.gz")
}

Answer 2

You're likely running into ulimit for the allowed number of open files per process.对于每个进程允许的打开文件数，您可能会ulimit 。 Run ulimit with the -a flag, and I think the default open files limit is 1024. The tarball has 2758 files.使用-a标志运行ulimit ，我认为默认open files限制为 1024。tarball 有 2758 个文件。

This is because you defer the closing of the file descriptor in the for loop processing the tarReader .这是因为您在处理tarReader的 for 循环中推迟了文件描述符的关闭。

To fix it, close each file as you've dealt with them:要修复它，请在处理完每个文件后关闭它们：

func UnTar(tarball, target string) error {
    reader, err := os.Open(tarball)
    if err != nil {
        return err 
    }   
    defer reader.Close()
    tarReader := tar.NewReader(reader)

    for {
        header, err := tarReader.Next()
        if err == io.EOF {
            break
        } else if err != nil {
            return err 
        }

        path := filepath.Join(target, header.Name)
        info := header.FileInfo()
        if info.IsDir() {
            if err = os.MkdirAll(path, info.Mode()); err != nil {
                return err 
            }
            continue
        }

        err = processOneFile(tarReader, path, info.Mode())
        if err != nil {
            return err 
        }
    }   
    return nil 
}

func processOneFile(tarReader io.Reader, filePath string, fileMode os.FileMode) error {
    file, err := os.OpenFile(filePath, os.O_CREATE|os.O_TRUNC|os.O_WRONLY, fileMode)
    if err != nil {
        return err 
    }   
    defer file.Close() // close error discarded
    _, err = io.Copy(file, tarReader)
    return err 
}

Answer 3

While the other answer about the ulimit is already really good, there are two things I just want to add:虽然关于ulimit的其他答案已经非常好，但我只想添加两件事：

you can decompress gzip and read the tar file at the same time instead of creating a temp file in between.您可以同时解压缩 gzip 并读取 tar 文件，而不是在两者之间创建临时文件。 You could also directly stream the file from the URL and extract it while downloading您也可以直接 stream URL 中的文件并在下载时解压缩
Someone could probably create a malicious tar.gz file that would make your code overwrite important files using something like zipslip (especially if your program runs as root, someone could inject a file path like ../../../../etc/passwd and thus overwrite that file, maybe even edit the crontab file and execute code that way?), you should probably check for that有人可能会创建一个恶意 tar.gz 文件，该文件会使您的代码使用zipslip 之类的东西覆盖重要文件（尤其是如果您的程序以 root 身份运行，有人可能会注入类似../../../../etc/passwd的文件路径../../../../etc/passwd从而覆盖该文件，甚至可能编辑 crontab 文件并以这种方式执行代码？），您可能应该检查一下

With that in mind, we can write a function that directly extracts from an io.Reader that also checks for any paths outside of the target directory:考虑到这一点，我们可以编写一个 function 直接从io.Reader中提取，它还检查目标目录之外的任何路径：

// untargz decompresses a gzipped tar stream to the directory specified by target.
// Note that `file` should be closed by the caller
func untargz(file io.Reader, targetDir string) (err error) {
    gz, err := gzip.NewReader(file)
    if err != nil {
        return
    }
    // This does not close file
    defer gz.Close()

    tarReader := tar.NewReader(gz)

    for {
        header, err := tarReader.Next()
        if err == io.EOF {
            break
        } else if err != nil {
            return err
        }

        // This can be dangerous, similar to zipslip
        path := filepath.Join(targetDir, header.Name)

        // Check for ZipSlip. More Info: https://snyk.io/research/zip-slip-vulnerability#go
        if !strings.HasPrefix(path, filepath.Clean(targetDir)+string(os.PathSeparator)) {
            err = fmt.Errorf("%s: illegal file path", path)
            return err
        }

        info := header.FileInfo()
        if info.IsDir() {
            if err = os.MkdirAll(path, info.Mode()); err != nil {
                return err
            }
            continue
        }

        file, err := os.OpenFile(path, os.O_CREATE|os.O_TRUNC|os.O_WRONLY, info.Mode())
        if err != nil {
            return err
        }

        _, err = io.Copy(file, tarReader)
        if err != nil {
            file.Close()
            return err
        }

        err = file.Close()
        if err != nil {
            return err
        }
    }

    return nil
}

It might also be beneficial to think about what would happen if a directory is declared after a file that is in that directory (because then os.Create fails), but this function doesn't handle that case.考虑一下如果在该目录中的文件之后声明一个目录会发生什么（因为os.Create失败），这也可能是有益的，但是这个 function 不能处理这种情况。

This function could be used to directly stream to an output directory, but I'm honestly not sure if that is what you want:这个 function 可以用来直接 stream 到 output 目录，但老实说，我不确定这是否是你想要的：

func main() {
    resp, err := http.Get(`https://www.clamav.net/downloads/production/clamav-0.103.1.tar.gz`)
    if err != nil {
        panic(err)
    }
    defer resp.Body.Close()

    err = untargz(bufio.NewReader(resp.Body), "out")
    if err != nil {
        panic(err)
    }

    println("Done")
}

You can find the full file here .您可以在此处找到完整文件。

在golang中提取tarball时丢失文件

问题描述

3 个解决方案

解决方案1
1 2021-03-18 00:08:42

解决方案2
1 已采纳 2021-03-18 03:11:55

解决方案3
0 2021-03-18 07:37:00

在golang中提取tarball时丢失文件

问题描述

3 个解决方案

解决方案1 1 2021-03-18 00:08:42

解决方案2 1 已采纳 2021-03-18 03:11:55

解决方案3 0 2021-03-18 07:37:00

解决方案1
1 2021-03-18 00:08:42

解决方案2
1 已采纳 2021-03-18 03:11:55

解决方案3
0 2021-03-18 07:37:00