I'd like to count the number of times a set of words appear in each paragraph in a text file. I am able to count the number of times a set of words appears in an entire text.
It has been suggested to me that my code is really buggy, so I'll just ask what I would like to do, and if you want, you can look at the code I have at the bottom.
So, given that "frequency_count.txt" has the words "apple pear grape melon kiwi" in it, I want to know how often "apple" shows up in each paragraph of a separate file "test_essay.txt", how often pear shows up, etc., and then for these numbers to be printed out in a series of lines of numbers, each corresponding to a paragraph.
For instance:
apple, pear, grape, melon, kiwi
3,5,2,7,8
2,3,1,6,7
5,6,8,2,3
Where each line corresponds to one of the paragraphs.
I am very, very new to Ruby, so thank you for your patience.
output_file = '/Users/yirenlu/Quora-Personal-Analytics/weka_input6.csv'
o = File.open(output_file, "r+")
common_words = '/Users/yirenlu/Quora-Personal-Analytics/frequency_count.txt'
c = File.open(common_words, "r")
c.each_line{|$line1|
words1 = $line1.split
words1.each{|w1|
the_file = '/Users/yirenlu/Quora-Personal-Analytics/test_essay.txt'
f = File.open(the_file, "r")
rows = File.readlines("/Users/yirenlu/Quora-Personal-Analytics/test_essay.txt")
text = rows.join
paragraph = text.split(/\n\n/)
paragraph.each{|p|
h = Hash.new
puts "this is each paragraph"
p.each_line{|line|
puts "this is each line"
words = line.split
words.each{|w|
if w1 == w
if h.has_key?(w)
h[w1] = h[w1] + 1
else
h[w1] = 1
end
$x = h[w1]
end
}
}
o.print "#{$x},"
}
}
o.print "\n"
o.print "#{$line1}"
}
If you're used to PHP or Perl you may be under the impression that a variable like $line1
is local, but this is a global. Use of them is highly discouraged and the number of instances where they are strictly required is very short. In most cases you can just omit the $
and use variables that way with proper scoping.
This example also suffers from nearly unreadable indentation, though perhaps that was an artifact of the cut-and-paste procedure.
Generally what you want for counters is to create a hash with a default of zero, then add to that as required:
# Create a hash where the default values for each key is 0
counter = Hash.new(0)
# Add to the counters where required
counter['foo'] += 1
counter['bar'] += 2
puts counter['foo']
# => 1
puts counter['baz']
# => 0
You basically have what you need, but everything is all muddled and just needs to be organized better.
Here are two one-liners to calculate frequencies of words in a string.
The first one is a bit easier to understand, but it's less effective:
txt.scan(/\w+/).group_by{|word| word.downcase}.map{|k,v| [k, v.size]}
# => [['word1', 1], ['word2', 5], ...]
The second solution is:
txt.scan(/\w+/).inject(Hash.new(0)) { |hash, w| hash[w.downcase] += 1; hash}
# => {'word1' => 1, 'word2' => 5, ...}
This could be shorter and easier to read if you use:
require 'csv'
common_words = %w(apple pear grape melon kiwi)
text = File.open("test_essay.txt").read
def word_frequency(words, text)
words.map { |word| text.scan(/\b#{word}\b/).length }
end
CSV.open("file.csv", "wb") do |csv|
paragraphs = text.split /\n\n/
paragraphs.each do |para|
csv << word_frequency(common_words, para)
end
end
Note this is currently case-sensitive but it's a minor adjustment if you want case-insensitivity.
To count how many times one word appears in a text:
text = "word aaa word word word bbb ccc ccc"
text.scan(/\w+/).count("word") # => 4
To count a set of words:
text = "word aaa word word word bbb ccc ccc"
wlist = text.scan(/\w+/)
wset = ["word", "ccc"]
result = {}
wset.each {|word| result[word] = wlist.count(word) }
result # => {"word" => 4, "ccc" => 2}
result["ccc"] # => 2
What about this:
# Create an array of regexes to be used in `scan' in the loop.
# `\b' makes sure that `barfoobar' does not match `bar' or `foo'.
p word_list = File.open("frequency_count.txt"){|io| io.read.scan(/\w+/)}.map{|w| /\b#{w}\b/}
File.open("test_essay.txt") do |io|
loop do
# Add lines to `paragraph' as long as there is a continuous line
paragraph = ""
# A `l.chomp.empty?' becomes true at paragraph border
while l = io.gets and !l.chomp.empty?
paragraph << l
end
p word_list.map{|re| paragraph.scan(re).length}
# The end of file has been reached when `l == nil'
break unless l
end
end
Here's an alternate answer, which is has been tweaked for conciseness (though not as easy to read as my other answer).
require 'csv'
words = %w(apple pear grape melon kiwi)
text = File.open("test_essay.txt").read
CSV.open("file.csv", "wb") do |csv|
text.split(/\n\n/).map {|p| csv << words.map {|w| p.scan(/\b#{w}\b/).length}}
end
I prefer the slightly longer but more self-documenting code, but it's fun to see how small it can get.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.