简体   繁体   English

统计Linux服务器上的och文件夹和文件结构

[英]Statistics och folders and file structure on a Linux server

I need to produce statistics for files that are stored on a Linux network share and would like to be able run a shell script or program locally on the network share to produce data points with the following attributes: 我需要为存储在Linux网络共享上的文件生成统计信息,并且希望能够在网络共享上本地运行Shell脚本或程序以生成具有以下属性的数据点:

path (or relativepath) | 路径(或相对路径)| filename | 文件名| filesize | 文件大小| datecreated | 创建日期| datechanged | 变更日期| dateaccessed 访问日期

There are roughly 1–2 million files (8TB) and I want to explore the dataset to get a grasp of the organization and balance of the file types (as determined by a combination of file name and path) in relation to the total number of files and total amount of storage. 大约有1-2百万个文件(8TB),我想探索一下数据集,以掌握文件类型(由文件名和路径的组合确定)相对于文件总数的组织和平衡。文件和总存储量。

Questions: 问题:

  1. What is an efficient way to traverse the file system and get this data? 遍历文件系统并获取此数据的有效方法是什么?

  2. What kind of database would you recommend to explore this kind of data with statistics at different levels in the hierarchy? 您建议使用哪种数据库来探索具有层次结构中不同级别的统计信息的此类数据?

This is what I ended up using to solve the problem: 这就是我最终用来解决问题的方法:

  1. Linux commands find and fstat were used generate the dataset as a plain text file. 使用Linux命令findfstat将数据集生成为纯文本文件。
  2. Python's pandas and exifread libraries were used to enrich and analyze the dataset. 使用Python的pandasexifread库来丰富和分析数据集。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM