使用正则表达式查找字符串，然后使用正则表达式查找新字符串以替换为

Question

我使用BatchGeo从电子表格创建地图，然后下载了KML数据，即：

<Placemark>
  <name>?</name>
    <Snippet></Snippet>
    <description><![CDATA[]]></description>
    <styleUrl>#style75</styleUrl>
    <address>1234 Example St Denver, CO 80221</address>
    <Point>
      <coordinates>-121.879364,37.815151,0.000000</coordinates>
    </Point>
</Placemark>

将点重新导入Google Maps后，这些点将放置在正确的地址/坐标上，但是左侧边栏每个图钉旁边的名称/描述符仅显示“？”。 而不是显示地址。

我想使用正则表达式查找每个"<name>?</name>" ，然后使用正则表达式查找文件中<address>.*</address>的下一个实例，然后返回并替换? 在<name>标记之间加上*在<address>标记之间。

每个点的<Placemark>标记之间都有一段代码，总共有数百个点。

这是我到目前为止的点点滴滴：

newkml = File.open( 'Newkml.txt', 'w' )

def process_line(x)
  unless x == "<name>?</name>"
    # just return the original line
  else
    # Find the next instance of /<address>(.*)<\/address>/
    # Go to the original line
    # Replace it with "<name>#{$1}</name>"
  end
end

File.foreach('Whatever.kml'){|line|} do line.process_line
# Make a new file, copy over all of the lines that aren't <name>?</name>,
# and fix the name lines using the method above

更新：在原始服务（BatchGeo）中，有一个选项可以设置哪些信息放入KML（XML）标签中，因此我创建了一个新地图，并从一开始就防止了该问题的发生。 感谢那些向我介绍了以后可以用于这种操作的工具的人。

更新2：尝试Mark Thomas的解决方案。 这是我运行的代码：

require 'rubygems'
require 'nokogiri'

doc = Nokogiri::XML("whatever.xml")

edits = 0

doc.xpath("//name").each do |name|
  if name.content == "?"
    name.content = name.xpath("following-sibling::address").text
    edits +=1
  end
end

puts( doc.inspect )
puts( "edits: #{edits}" )
puts doc

这给了我以下输出：

#<Nokogiri::XML::Document:0xfe0064 name="document>
edits: 0
<?xml version="1.0"?>

如果我添加的edits测试代码按我认为的那样工作，则似乎表明if name.content == "?" 块执行0次（比我预期的要少130倍）。

Answer 1

您已经完成了几乎所有语言的解析/生成KML文件的工作。 我怀疑这将为您工作： https : //github.com/schleyfox/ruby_kml

更新

实际未使用上述库的情况下，我想确认一下我的建议-似乎所有的帮助器功能都可用于创建KML文件，但是仍然需要您使用XML解析器来加载一个。 我仍然建议这样做总比建议的使用XML解析器操作KML更好（尽管这当然也可以工作），但是您可能还想看看http://georuby.rubyforge.org/georuby-doc /index.html ，它确实支持KML输入和输出。

更新2-为后代添加。

让我多加思考，我对此类问题的默认建议是：

将KML解析为对象
更正错误
重新生成KML

我的理由是，这应该不太容易破坏输出，并且如果您最终开始对KML进行更多的操作，那么您已经达到了90％的程度。

所有这一切都说明了，在您的特定情况下，仅对已知数据进行识别，@ Mark Thomas的方法将提供更快，代码开销更少的解决方案。

Answer 2

我建议改用XML解析器。 这是一些Nokogiri示例代码：

doc = Nokogiri::XML(kml)

doc.xpath("//name").each do |name|
  if name.content == "?"
    name.content = name.xpath("following-sibling::address").text
  end
end

更新

根据您的更新，您的XML文件的解析似乎出了点问题–您确定它是有效的吗？

这是一个完整的工作示例：

require 'nokogiri'

xml = <<End
<Placemark>
  <name>?</name>
    <Snippet></Snippet>
    <description><![CDATA[]]></description>
    <styleUrl>#style75</styleUrl>
    <address>1234 Example St Denver, CO 80221</address>
    <Point>
      <coordinates>-121.879364,37.815151,0.000000</coordinates>
    </Point>
</Placemark>

End

doc = Nokogiri::XML(xml)

doc.xpath("//name").each do |name|
  if name.content == "?"
    name.content = name.xpath("following-sibling::address").text
  end
end

puts doc

输出：

<?xml version="1.0"?>
<Placemark>
  <name>1234 Example St Denver, CO 80221</name>
    <Snippet/>
    <description/>
    <styleUrl>#style75</styleUrl>
    <address>1234 Example St Denver, CO 80221</address>
    <Point>
      <coordinates>-121.879364,37.815151,0.000000</coordinates>
    </Point>
</Placemark>

Answer 3

让我们尝试使用这个：

require 'nokogiri'

doc = Nokogiri::XML::DocumentFragment.parse(<<EOT)
<Placemark>
  <name>?</name>
  <Snippet></Snippet>
  <description><![CDATA[]]></description>
  <styleUrl>#style75</styleUrl>
  <address>1234 Example St Denver, CO 80221</address>
  <Point>
    <coordinates>-121.879364,37.815151,0.000000</coordinates>
  </Point>
</Placemark>
EOT

doc.search('Placemark').each do |placemark|
  name = placemark.at('name')
  address = placemark.at('address')
  name.content = address.text
end

puts doc.to_xml

哪个输出：

<Placemark>
  <name>1234 Example St Denver, CO 80221</name>
  <Snippet/>
  <description><![CDATA[]]></description>
  <styleUrl>#style75</styleUrl>
  <address>1234 Example St Denver, CO 80221</address>
  <Point>
    <coordinates>-121.879364,37.815151,0.000000</coordinates>
  </Point>
</Placemark>

我使用了文档片段来避免添加多余的内容。 通常，您需要使用普通的Nokogiri::XML来解析文档。

我假设您有多个<Placemark>标记，该标记将重复进行search ，并返回一个NodeSet。

使用正则表达式查找字符串，然后使用正则表达式查找新字符串以替换为

问题描述

3 个解决方案

解决方案1
3 2013-01-17 04:22:57

解决方案2
2 2013-01-17 03:01:39

解决方案3
0 2013-01-23 02:50:01

使用正则表达式查找字符串，然后使用正则表达式查找新字符串以替换为

问题描述

3 个解决方案

解决方案1 3 2013-01-17 04:22:57

解决方案2 2 2013-01-17 03:01:39

解决方案3 0 2013-01-23 02:50:01

解决方案1
3 2013-01-17 04:22:57

解决方案2
2 2013-01-17 03:01:39

解决方案3
0 2013-01-23 02:50:01