vefyy.blogg.se - Elinks force refresh

ELINKS FORCE REFRESH CODE

Ideally, what you need here, is a real web browser to give you the information.

ELINKS FORCE REFRESH CODE

That is, you need something to do the HTTP request with the proper parameters, intepret the HTTP response correctly, fully interpret the HTML code as a browser would, and return the title.Īs I don't think that can be done on the command line with the browsers I know (though see now this trick with lynx), you have to resort to heuristics and approximations, and the one above is as good as any. You may also want to take into consideration performance, security. For instance, to cover all the cases (for instance, a web page that has some javascript pulled from a 3rd party site that sets the title or redirect to another page in an onload hook), you may have to implement a real life browser with its dom and javascript engines that may have to do hundreds of queries for a single HTML page, some of which trying to exploit vulnerabilities. While using regexps to parse HTML is often frowned upon, here is a typical case where it's good enough for the task (IMO). You can also use curl and grep to do this. (?=) = look for a string that ends with this to the right of it.(?) = look for a string that starts with this to the left of it.-o = Return only the portion that matches.You'll need to enlist the use of PCRE (Perl Compatible Regular Expressions) in grep to get the look behind and look ahead facilities so that we can find the. spans multiple lines, then the above won't find it. You can mitigate this situation by using tr, to delete any \n characters, i.e. The tool sed can be used to do this: $ curl '' -so - | \ If the is set like this, then you'll need to remove this prior to greping it. The above finds the case insensitive string lang= followed by a word sequence ( \w+). A real HTML/XML Parser - using RubyĪt some point regex will fail in solving this type of problem. It's available in Ruby as a Gem and can be used like so: $ curl '' -so - | \ If that occurs then you'll likely want to use a real HTML/XML parser. 'puts Nokogiri::HTML(readlines.join).xpath("//title").map ' The above is parsing the data that comes via the curl as HTML ( Nokogiri::HTML).

The method xpath then looks for nodes (tags) in the HTML that are leaf nodes, ( //) with the name title. For each found we want to return its content ( e.content).

$tree = HTML::TreeBuilder::XPath->new_from_url($ARGV) You can also do something similar with Perl and the HTML::TreeBuilder::XPath module. I have made a simplified version of it by using running Lynx and using files pre-configured in advance.Īdd the following line to /etc/lynx-cur/lynx.cfg (or wherever your lynx.cfg resides): PRINTER:P:printenv LYNX_PRINT_TITLE>/home/account/title.txt:TRUE:1000 I liked the idea of Stéphane Chazelas to use Lynx and LYNX_PRINT_TITLE, but that script didn't work for me under Ubuntu 14.04.5.