简体   繁体   English

如何使用Ruby从哈希中提取相关的数字(代码)?

[英]How to extract related numbers (codes) from a hash using Ruby?

I am learning Ruby and I am trying to apply Ruby to extract related codes from a hash and do not understand how to identify them in a hash. 我正在学习Ruby,并且尝试使用Ruby从哈希中提取相关代码,并且不了解如何在哈希中识别它们。 The codes have been extracted from the 2014 Mesh Tree Codes file from the NLM website. 这些代码是从NLM网站的2014 Mesh Tree Codes文件中提取的。 The codes are associated with MeSH terms, and appear in the file as follows (using the term "Motor Activity as an example): 这些代码与MeSH术语相关联,并如下显示在文件中(以术语“电机活动”为例):

Motor Activity;F01.145.632 运动活动; F01.145.632

I have this information in a hash using the code as the key and term as the value. 我将这些信息存储在哈希中,使用代码作为键,将术语作为值。 I need to extract the related terms using their codes; 我需要使用它们的代码来提取相关术语; the parent would contain three fewer digits, the siblings would have different last three digits, and the children would have the exact same code plus any number of additional digits in the form .XXX.XXX; 父级将少包含三个数字,兄弟级将具有不同的后三个数字,而子级将具有完全相同的代码以及.XXX.XXX;格式的任意数量的其他数字.XXX.XXX; an example of these codes is as follows: 这些代码的示例如下:

Motor Activity [F01.145.632]
Behavior and Behavior Mechanisms [F01]              
Behavior [F01.145]
Information Seeking Behavior [F01.145.535]          
Inhibition (Psychology) [F01.145.544]           
Freezing Reaction, Cataleptic [F01.145.632.555]     
Immobility Response, Tonic [F01.145.632.680]

So far, I have opened the file and saved the codes as the keys and the terms as the values. 到目前为止,我已经打开文件并将代码保存为键,将术语保存为值。 The script is as follows: 脚本如下:

mesh = File.open('mtrees2014.bin').read
mesh.gsub!(/\r?\n/)
mesh.each_line do |line|
  line.chomp!
  mesh_descriptor, tree_code = line.split(/\;/)
  descriptor_code_hash[tree_code] = "#{mesh_descriptor}"
end

I need to understand how to extract the first term ( motor activity:F01.145.632 ), then the siblings ( F01.145.632 with last three digits different), children ( F01.145.632 with any number of additional digits .XXX.XXX ), and parents ( F01.145.632 less last three digits) from the hash. 我需要了解如何提取的第一项( motor activity:F01.145.632 ),然后是兄弟姐妹( F01.145.632与最后三位数不同),儿童( F01.145.632任意数量的额外的数字.XXX.XXX )和父项( F01.145.632减去后三位数字)。 Can this be done with regular expressions? 可以使用正则表达式吗? Or, some other strategy? 还是其他一些策略? I will then be saving these codes and terms into another hash. 然后,我将这些代码和术语保存到另一个哈希中。 Thank you for taking the time to read this! 感谢您抽出时间来阅读! Any suggestions would be greatly appreciated! 任何建议将不胜感激!

motor_code = 'F01.145.632'

parents = descriptor_code_hash.select do |k, v|
  motor_code[/^#{k}/] && motor_code != k 
end.map { |k, v| v }
# => ["Behavior and Behavior Mechanisms", "Behavior"] 

siblings = descriptor_code_hash.select do |k, v| 
  k =~ /^#{motor_code.split('.')[0..-2].join('\.')}\.\d{3}/ && k != motor_code 
end.map { |k, v| v }
# => ["Information Seeking Behavior", "Inhibition (Psychology)", "Freezing Reaction, Cataleptic", "Immobility Response, Tonic"]

children = descriptor_code_hash.select do |k, v| 
  k =~ /^#{motor_code}\.[\d\.]*/ 
end.map { |k, v| v }
# => ["Freezing Reaction, Cataleptic", "Immobility Response, Tonic"] 

parents are found by looking for all keys which are prefixes to the motor_code . parents是通过寻找它们前缀到所有按键发现motor_code
siblings are found by looking for all keys which are prefixed by the parent key of motor_code (removing the last three digits, and expecting exactly three digits. siblings都通过寻找其被母公司关键前缀的所有按键发现motor_code (去掉最后三位数,并准确期待三位。
children are found by looking for all keys which are prefixed by motor_code 通过查找所有以motor_code为前缀的键来找到children

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM