简体   繁体   English

Perl脚本可对单词计数并在一个文件中打印

[英]Perl script to count words and print in one file

I've been working on a Perl script for my master thesis to extract a small piece of text (CAE) from a 10K (an annual report of a company). 我一直在为自己的硕士学位论文编写Perl脚本,以从10K(公司的年度报告)中提取一小段文本(CAE)。 I managed to finish writing this script after a lot of work. 经过大量工作,我设法完成了此脚本的编写。 Now I need to write a new script, but due to a deadline next week, I'm afraid I won't make it in time to finish. 现在,我需要编写一个新脚本,但是由于下周的截止日期,恐怕我无法及时完成。 I was wondering if there is someone who can help me with the following problem: 我想知道是否有人可以帮助我解决以下问题:

I have almost 52.000 .txt files with a small piece of text. 我几乎有52.000个.txt文件,其中有一小段文字。 I need a script which writes down the name of each .txt file, and the amount of words and/or characters in this file and copies this of all the files into one text file. 我需要一个脚本,用于记录每个.txt文件的名称以及该文件中的单词和/或字符的数量,并将所有这些文件复制到一个文本文件中。

Is there someone who could help me please? 请问有人可以帮助我吗? I would really appreciate it! 我真的很感激!

This is what I got so far: 这是我到目前为止所得到的:

#!/usr/bin/perl -w
use strict;
use warnings;

my $folder;                     #Base directory for the 10K filings
my $subfolder="2012";           #Subdirectory where 10K filings are placed (Default is ./10K/10K_Raw/2012/*.txt)
my $folder10kcae="10K_CAE";     #Name of subdirectory for output (CAE)
my $folderwc="10K_WC";          #Name of subdirectory for output (WordCount)
my $target_cae;                 #Name of target directory for output (CAE)
my $target_wc;                  #Name of target directory for output (WordCount)
my $slash;                      #Declare slash (dependent on operating system)
my $file;                       #Filename
my @allfiles;                   #All files in directory, put into an array
my $allfiles;                   #Total files in directory
my $data;                       #Input file contents
my $cae;                        #Results of the search query (CAE)
my $wc                          #Results of the search query (WordCount)
my $output_cae;                 #Output file with CAE
my $output_wc;                  #Output file with WordCount
my $log;                        #Log file (also used to determine point to continue progress)
my $logfile="$subfolder".".log";#Filename of log file
my @filesinlog;                 #Files that have been processed according to log file

{
#Set folders for Windows. Put raw 10K filings in folder\subfolder
$slash="\\";
$folder="C:\\10KK\\";                    ###specify correct base-map###
}


#Open source folder and read all files
opendir(DIR,"$folder$slash$subfolder") or die $!;
@allfiles=grep /(.\.txt)/, readdir DIR;
chomp(@allfiles);


#Creates destination folder
$target_wc="$folder$slash$folder10kwc$slash$subfolder";

mkdir "$folder$slash$folder10kwc";
mkdir $target_wc;


#Count lines, words and characters
my ($lines, $words, $chars) = (0,0,0);

while ($data=@allfiles) {
$lines++;
$chars += length($_);
$words += scalar(split(/\s+/, $_));
}

open $output_wc, ">", "$target_wc$slash$file" or die $!;
print $output_wc $wc;
close $output_wc;

print("lines=$lines words=$words chars=$chars\n");

I'd say you have a bit of a wheel reinvention problem here, and I wouldn't use a perl script. 我会说您在这里有一些车轮重新发明的问题,并且我不会使用perl脚本。 There's a unix command line tool called 'wc' ( short for word count ), that will do everything you want to do with no programming required. 有一个称为'wc'的unix命令行工具(word count的缩写),它可以完成您想做的所有事情,而无需编程。

On unix 在Unix上

$ wc /path/to/my/folder/* > /path/to/my/output/file.txt

On windows, you can download the wc program as part of the GNU Coreutils for Windows package, then run the same command in windows stylee 在Windows上,您可以将Wc程序作为GNU Coreutils for Windows软件包的一部分下载,然后在Windows stylee中运行相同的命令

C:\ > wc \path\to\my\folder\* > \path\to\my\output\file.txt

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM