[英]How do I workaround invalid byte sequence in UTF-8 in ruby text file parsing?
[英]Parsing the text file in ruby
我的文本文件看起來像這樣
VOTE 1168041805 Campaign:ssss_uk_01B Validity:during Choice:Antony CONN:MIG01TU MSISDN:00777778359999 GUID:E6109CA1-7756-45DC-8EE7-677CA7C3D7F3 Shortcode:63334
VOTE 1168041837 Campaign:ssss_uk_01B Validity:during Choice:Leon CONN:MIG00VU MSISDN:00777770939999 GUID:88B52A7B-A182-405C-9AE6-36FCF2E47294 Shortcode:63334
我想要獲得這樣做的投票活動有效性選擇的價值:
File.foreach('lib/data/file.txt') do |line|
line = line.tidy_bytes
begin
aline = line.match(/^VOTE\s(\d+)\sCampaign:([^ ]+)\sValidity:([^ ]+)\sChoice:([^ ]+)/)
unless aline.nil?
## do something
end
rescue Exception => e
raise " error: " + e.inspect
p line.inspect
next
end
end
有沒有更好的方法可以做到這一點
aline = line.match(/^VOTE\s(\d+)\sCampaign:([^ ]+)\sValidity:([^ ]+)\sChoice:([^ ]+)/)
並獲得aline [1] aline [2] aline [3]和aline [4]
您可以使用命名捕獲來獲取結果的哈希值:
# use a freezed contant instead of making a new Regexp object for each line
REGEXP = /^VOTE\s(?<id>\d+)\sCampaign:(?<campaign>[^ ]+)\sValidity:(?<validity>[^ ]+)\sChoice:(?<choice>[^ ]+)/.freeze
File.foreach('lib/data/file.txt') do |line|
begin
matches = line.tidy_bytes.match(REGEXP)
hash = matches.names.zip(matches.captures).to_h
end
rescue Exception => e
raise " error: " + e.inspect
p line.inspect
next
end
end
如果所需結果是一個數組,則可能要使用.map
:
# use a freezed contant instead of making a new Regexp object for each line
REGEXP = /^VOTE\s(?<id>\d+)\sCampaign:(?<campaign>[^ ]+)\sValidity:(?<validity>[^ ]+)\sChoice:(?<choice>[^ ]+)/.freeze
results = File.foreach('lib/data/file.txt').map do |line|
matches = line.tidy_bytes.match(REGEXP)
matches.names.zip(matches.captures).to_h
end
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.