[英]Help with PHP data mangling as with sed/awk/grep
Ok guys.. I have a HTML i need to parse into a php script and mangle the data around abit. 好的,伙计们。我有一个HTML,我需要解析成一个php脚本并处理数据。 For best explanation I will show how I would do this in a bash script using awk, grep, egrep, and sed through a god awful set of pipes.
为了获得最佳解释,我将展示如何在abk脚本中使用awk,grep,egrep并通过可怕的管道sed来执行此操作。 Commented for clarity.
评论清楚。
curl -s http://myhost.net/mysite/ | \ # retr the document
awk '/\/\action/,/submit/' | \ # Extract only the form element
egrep -v "delete|submit" | \ # Remove the action lines
sed 's/^[ \t]*//;s/[ \t]*$//' | \ # Trim extra whitespaces etc.
sed -n -e ":a" -e "$ s/\n//gp;N;b a" | \ # Remove every line break
sed '{s|<br />|<br />\n|g}' | \ # Insert new line breaks after <br />
grep "onemyndseye@localhost" | \ # Get lines containing my local email
sed '{s/\[[^|]*\]//g}' | \ # Remove my email from the line
These commands take the form element that looks like this: 这些命令采用如下形式的form元素:
<form action="/action" method="post">
<input type="checkbox" id="D1" name="D1" /><a href="http://www.linux.com/rss/feeds.php">
http://www.linux.com/rss/feeds.php
</a> [email:
onemyndseye@localhost (Default)
]<br />
<input type="checkbox" id="D2" name="D2" /><a href="http://www.ubuntu.com/rss.xml">
http://www.ubuntu.com/rss.xml
</a> [email:
onemyndseye@localhost (Default)
]<br />
<input type="submit" name="delete_submit" value="Delete Selected" />
And mangles it into complete one-line input statements.. Ready to be inserted into another form: 并将其整理成完整的单行输入语句。准备插入另一种形式:
<input type="checkbox" id="D1" name="D1" /><a href="http://www.linux.com/rss/feeds.php">http://www.linux.com/rss/feeds.php</a> <br />
<input type="checkbox" id="D2" name="D2" /><a href="http://www.ubuntu.com/rss.xml">http://www.ubuntu.com/rss.xml</a> <br />
The big question is how to accomplish this in PHP? 最大的问题是如何在PHP中完成此任务? I am comfortable with using PHP to curl a page... but it seems I am lost on filtering the output.
我对使用PHP卷曲页面感到满意...但是似乎我对过滤输出迷失了。
Thanks in advance. 提前致谢。 :)
:)
You don't filter output. 您不过滤输出。 You use simple_html_dom to parse and manipulate that way.
您可以使用simple_html_dom进行解析和操作。 it really is more intuitive.
确实更直观。
Something like 就像是
// Create DOM from URL or file
$html = file_get_html('...');
// Find all a hrefs in a form tag
foreach($html->find('form a') as $element)
echo $element->src . '<br>';
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.