DOM PARSING - How make parsing html in a file like this

问题: I want to extract from the dom the text at the of the file obteined from an external server with a curl request. I put the request in $html_response variable. I started w...

问题:

I want to extract from the dom the text at the of the file obteined from an external server with a curl request. I put the request in $html_response variable.

I started with

$dom = new DOMDocument;
$dom->loadHTML($html_response);

But how to extract the text at the end of this file? (scroll down to show)

<html>
<head>
  <title></title>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
</head>

<body>
<font style="text-decoration:none; font-family: Arial; font-size: 40px; color: #b4b4b4; eight: 35px;">HI</font>
<div class="toolbar">
</div>
<style type="text/css">
body,
td,
th {
  color: #000000;
}
body {
  background-color: #eeeeee;
}
</style>
<meta name="viewport" content="width=device-width">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<form method="post" action="">
<p><b>YES YOU CAN</b>
  <input type="text" size="12" maxlength="14" name="named" value="">
  <input type="submit" value="a" name="submit"><br>
</p>
<p><span class="center"><b>PT1</b></span></p>
<p><span class="center"><b>PT2</b> <br>
  <b>XXXXXXXX bold</b> other words!</span></p>
</form><br> other words
<font size="5" face="monospace" color="Black">other words</font>
<br><br>


CANT' TAKE THIS PART BECOUSE THERE ISN'T A TAG THAT CLOSE THIS TEXT
AND I'M WORKING ON A EXTERNAL WEBSITE


</body>
</html>

Thank you!


回答1:

The solution will implied a bit of guessing, as there is several text nodes at body level

$dom = new DOMDocument;
$dom->loadHTML($html_response);

// retrieve text node at 'body' level with XPath
$xpath = new DOMXpath($dom);
$textNodes = $xpath->query('/html/body/child::text()');

// filter the nodes' content to retrieve the most pertinent ones (here, remove empty texts)
$texts = array();
foreach($textNodes as $node)
{
    if( strlen($node->nodeValue) > 0)
        $texts[] = $node->nodeValue ;
}

// get the latest text, as what you need is always at the bottom of the page 
echo end($texts); // CANT' TAKE THIS PART BECOUSE THERE ISN'T A TAG THAT CLOSE THIS TEXT AND I'M WORKING ON A EXTERNAL WEBSITE 
  • 发表于 2019-03-05 13:39
  • 阅读 ( 65 )
  • 分类:sof

条评论

请先 登录 后评论
不写代码的码农
小编

篇文章

作家榜 »

  1. 小编 文章
返回顶部
部分文章转自于网络,若有侵权请联系我们删除