简体   繁体   中英

find the 5 largest (by number of lines) files at git repository

My aim: I want to start working on a new OpenSource project. project by link: https://github.com/dry-python/returns/tree/master/returns First, I need to understand what files I have to work with?

Task: The task is to sort the files by the number of lines of code, find the 5 files with the most lines. What command in the console can I use for this?

What I have already done:

  1. Uploaded the repository files to my local machine into a directory called "returns-master"
  2. Run the command:
ls / returns-master | wc -l | sort -n | head -n 5

In response, I get an error:

ls: returns-master: No such file or directory
17

Check out the git repo first. Then find the desired files on the disk:

find /path/to/your/copy/of/repo -type f | xargs wc -l | sort -gr | head -n6 | tail -n +2 | perl -lane 'print $F[-1]'

Here, find passes the list of files in the checked out git repo to xargs , which feeds them to wc -l , which counts the lines.
sort -gr : Sort in reverse by the first column (number of lines).
head -n6 | tail -n +2 head -n6 | tail -n +2 : Grab the top 6 entries returned by wc , which includes the first one with the total , which we remove using tail .
perl -lane 'print $F[-1]' : Print the last column delimited by whitespace ( the file names).

You need a shell that supports the ** glob. I think zsh does by default, and in bash you need shopt -s globstar . You can try sort instead of gsort , too.

wc -l **/*(.)| gsort -n

You can add | gtail -n 6 | gtail -n 6 to get only the top five.

To get only .zsh files:

wc -l **/*.zsh | gsort -n

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM