<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"
xmlns:rawvoice="http://www.rawvoice.com/rawvoiceRssModule/"

	>
<channel>
	<title>Comments on: Detecting Bots with Javascript for Better A/B Test Results</title>
	<atom:link href="http://www.kalzumeus.com/2010/06/07/detecting-bots-in-javascrip/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.kalzumeus.com/2010/06/07/detecting-bots-in-javascrip/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=detecting-bots-in-javascrip</link>
	<description>Patrick McKenzie (patio11) blogs on software development, marketing, and general business topics</description>
	<lastBuildDate>Thu, 14 Jan 2016 20:48:09 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=4.1.7</generator>
	<item>
		<title>By: Does Your Product Logo Actually Matter?: MicroISV on a Shoestring</title>
		<link>http://www.kalzumeus.com/2010/06/07/detecting-bots-in-javascrip/#comment-2699</link>
		<dc:creator><![CDATA[Does Your Product Logo Actually Matter?: MicroISV on a Shoestring]]></dc:creator>
		<pubDate>Wed, 04 Aug 2010 12:30:29 +0000</pubDate>
		<guid isPermaLink="false">http://www.kalzumeus.com/?p=949#comment-2699</guid>
		<description><![CDATA[[...] A/B testing framework I wrote.  This particular test required A/Bingo to be slightly extended to ignore bots, because&#8201;&#8212;&#8201;since the test was sitewide and visible prior to a bot-blocking [...]]]></description>
		<content:encoded><![CDATA[<p>[&#8230;] A/B testing framework I wrote.  This particular test required A/Bingo to be slightly extended to ignore bots, because&thinsp;&#8212;&thinsp;since the test was sitewide and visible prior to a bot-blocking [&#8230;]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Rob</title>
		<link>http://www.kalzumeus.com/2010/06/07/detecting-bots-in-javascrip/#comment-2698</link>
		<dc:creator><![CDATA[Rob]]></dc:creator>
		<pubDate>Fri, 23 Jul 2010 14:51:58 +0000</pubDate>
		<guid isPermaLink="false">http://www.kalzumeus.com/?p=949#comment-2698</guid>
		<description><![CDATA[Very interesting post. I&#039;ve done something similar for websites that I develop, but using a JS Image Object &amp; a PNG URL. My theory is that some bots that are smart enough to do a bit of math to figure out a new URL to crawl might be too smart and exclude URLs that appear to be images.

Something like:

var RT=new Image();
RT.src=&quot;/__rt.png?x=&quot;+(12345 + 67890);


Previously I left out the &quot;.png&quot; from the URL and a higher percentage of robots did request the URL. Adding the extension provided a 10% improvement to my detection rates.]]></description>
		<content:encoded><![CDATA[<p>Very interesting post. I&#8217;ve done something similar for websites that I develop, but using a JS Image Object &amp; a PNG URL. My theory is that some bots that are smart enough to do a bit of math to figure out a new URL to crawl might be too smart and exclude URLs that appear to be images.</p>
<p>Something like:</p>
<p>var RT=new Image();<br />
RT.src=&#8221;/__rt.png?x=&#8221;+(12345 + 67890);</p>
<p>Previously I left out the &#8220;.png&#8221; from the URL and a higher percentage of robots did request the URL. Adding the extension provided a 10% improvement to my detection rates.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Patrick</title>
		<link>http://www.kalzumeus.com/2010/06/07/detecting-bots-in-javascrip/#comment-2697</link>
		<dc:creator><![CDATA[Patrick]]></dc:creator>
		<pubDate>Wed, 07 Jul 2010 15:38:58 +0000</pubDate>
		<guid isPermaLink="false">http://www.kalzumeus.com/?p=949#comment-2697</guid>
		<description><![CDATA[That works great for blocking bots from forms, but less well for blocking bots from being counted as visitors to the site, since all they need to do to get counted for that is access a single file by HTTP.]]></description>
		<content:encoded><![CDATA[<p>That works great for blocking bots from forms, but less well for blocking bots from being counted as visitors to the site, since all they need to do to get counted for that is access a single file by HTTP.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dom</title>
		<link>http://www.kalzumeus.com/2010/06/07/detecting-bots-in-javascrip/#comment-2696</link>
		<dc:creator><![CDATA[Dom]]></dc:creator>
		<pubDate>Wed, 07 Jul 2010 04:23:48 +0000</pubDate>
		<guid isPermaLink="false">http://www.kalzumeus.com/?p=949#comment-2696</guid>
		<description><![CDATA[Thanks for a great Ruby gem (abingo). I have combined Rack Honeypot (http://github.com/sunlightlabs/rack-honeypot) on my sign-up forms, which is what I wanted to track conversions on using a/b testing. The honeypot (field positioned with CSS offscreen or hidden) provides a way to distinguish bots from humans. I&#039;m sure it&#039;s not foolproof but it does seem to catch most bots out, and I&#039;m pleased with the results.]]></description>
		<content:encoded><![CDATA[<p>Thanks for a great Ruby gem (abingo). I have combined Rack Honeypot (<a href="http://github.com/sunlightlabs/rack-honeypot" rel="nofollow">http://github.com/sunlightlabs/rack-honeypot</a>) on my sign-up forms, which is what I wanted to track conversions on using a/b testing. The honeypot (field positioned with CSS offscreen or hidden) provides a way to distinguish bots from humans. I&#8217;m sure it&#8217;s not foolproof but it does seem to catch most bots out, and I&#8217;m pleased with the results.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ferrix</title>
		<link>http://www.kalzumeus.com/2010/06/07/detecting-bots-in-javascrip/#comment-2695</link>
		<dc:creator><![CDATA[ferrix]]></dc:creator>
		<pubDate>Tue, 15 Jun 2010 17:41:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.kalzumeus.com/?p=949#comment-2695</guid>
		<description><![CDATA[What about the perl &quot;Mechanize&quot; engine?  Naively speaking, that&#039;s what I&#039;d use to create a bot.  It acts exactly like a real browser as far as I know, including scripting.  Do people really use wget for bots?]]></description>
		<content:encoded><![CDATA[<p>What about the perl &#8220;Mechanize&#8221; engine?  Naively speaking, that&#8217;s what I&#8217;d use to create a bot.  It acts exactly like a real browser as far as I know, including scripting.  Do people really use wget for bots?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dennis Gorelik</title>
		<link>http://www.kalzumeus.com/2010/06/07/detecting-bots-in-javascrip/#comment-2694</link>
		<dc:creator><![CDATA[Dennis Gorelik]]></dc:creator>
		<pubDate>Fri, 11 Jun 2010 23:30:23 +0000</pubDate>
		<guid isPermaLink="false">http://www.kalzumeus.com/?p=949#comment-2694</guid>
		<description><![CDATA[It&#039;s better to discriminate not against bots, but against abusers.
Bots can behave appropriately (or not).
Humans can abuse web site manually (or not).
One way to determine abuse is to count number of requests to your web site from the same IP address made during last 30 seconds. If there are too many requests -- then it may make sense to block such user (bot or human does not matter).

For A/B testing purposes relying on JavaScript like Patrick described -- is a good solution.
For blocking excessive request -- relying on JavaScript validation is not the best approach.]]></description>
		<content:encoded><![CDATA[<p>It&#8217;s better to discriminate not against bots, but against abusers.<br />
Bots can behave appropriately (or not).<br />
Humans can abuse web site manually (or not).<br />
One way to determine abuse is to count number of requests to your web site from the same IP address made during last 30 seconds. If there are too many requests &#8212; then it may make sense to block such user (bot or human does not matter).</p>
<p>For A/B testing purposes relying on JavaScript like Patrick described &#8212; is a good solution.<br />
For blocking excessive request &#8212; relying on JavaScript validation is not the best approach.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Chris Williams</title>
		<link>http://www.kalzumeus.com/2010/06/07/detecting-bots-in-javascrip/#comment-2693</link>
		<dc:creator><![CDATA[Chris Williams]]></dc:creator>
		<pubDate>Tue, 08 Jun 2010 20:48:14 +0000</pubDate>
		<guid isPermaLink="false">http://www.kalzumeus.com/?p=949#comment-2693</guid>
		<description><![CDATA[Hey Patrick, interesting approach.
Caught the link via HN and Google Reader no less.

Liked the idea so much, that my subconscious expanded on what hypothetical / theoretical alternatives us programmers could put the users browser and therefore CPU to good use.

What if we were able, or sites like facebook were able to use the users impressions, using a small part of the users cpu to do some sort of tasks.

My first thought was hashing md5&#039;s or more complicated things like searching subsets of dna sequences for patterns.

Slightly longer version.
http://thoughtsofaprogrammer.tumblr.com/post/677654730]]></description>
		<content:encoded><![CDATA[<p>Hey Patrick, interesting approach.<br />
Caught the link via HN and Google Reader no less.</p>
<p>Liked the idea so much, that my subconscious expanded on what hypothetical / theoretical alternatives us programmers could put the users browser and therefore CPU to good use.</p>
<p>What if we were able, or sites like facebook were able to use the users impressions, using a small part of the users cpu to do some sort of tasks.</p>
<p>My first thought was hashing md5&#8217;s or more complicated things like searching subsets of dna sequences for patterns.</p>
<p>Slightly longer version.<br />
<a href="http://thoughtsofaprogrammer.tumblr.com/post/677654730" rel="nofollow">http://thoughtsofaprogrammer.tumblr.com/post/677654730</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Loïc d'Anterroches</title>
		<link>http://www.kalzumeus.com/2010/06/07/detecting-bots-in-javascrip/#comment-2692</link>
		<dc:creator><![CDATA[Loïc d'Anterroches]]></dc:creator>
		<pubDate>Tue, 08 Jun 2010 11:39:17 +0000</pubDate>
		<guid isPermaLink="false">http://www.kalzumeus.com/?p=949#comment-2692</guid>
		<description><![CDATA[Matthew, then I must be lucky or maybe they split evenly and do not affect the stats. I am using the goals of Google Analytics to double check conversions etc. and I have an average of 10% deviation between my tools and GA results.

For information, the websites are http://www.chemeo.com (physical properties of chemical components) and http://www.indefero.net (hosted git/subversion repositories and project management). These are not websites where one make money selling attorney services or things like that. This is maybe why the bots coming are not that bad.

Maybe my procedure to manage my tests is also helping, here is what I am doing:

- always have at least one A/A test open to ensure that the results are sound. That is, I test 2 times the same alternative and they come out with the same score.
- rerun on a regular basis the tests which were not that significant.
- keep long term stats to be sure that if my test gave an alternative which went from a 15% to a 20% conversion rate, it stays like that on the long run (or improve).

So I &quot;fire and forget&quot; a lot of tests, but I keep track of the effects on the long run and adjust. I accept that the numbers in the tests can be a bit off and always check against my real conversion rates at the end of the day. Note that the 10% deviation is that my framework (at the application level) picks up 10% more visitors, they can be bots (but they will not convert) or people without javascript enabled.

I hope it helps understanding my particular situation. The code of the A/B testing framework (PHP+MongoDB) is available (LGPL) here:
http://projects.ceondo.com/p/pluf/source/tree/master/src/Pluf/AB.php]]></description>
		<content:encoded><![CDATA[<p>Matthew, then I must be lucky or maybe they split evenly and do not affect the stats. I am using the goals of Google Analytics to double check conversions etc. and I have an average of 10% deviation between my tools and GA results.</p>
<p>For information, the websites are <a href="http://www.chemeo.com" rel="nofollow">http://www.chemeo.com</a> (physical properties of chemical components) and <a href="http://www.indefero.net" rel="nofollow">http://www.indefero.net</a> (hosted git/subversion repositories and project management). These are not websites where one make money selling attorney services or things like that. This is maybe why the bots coming are not that bad.</p>
<p>Maybe my procedure to manage my tests is also helping, here is what I am doing:</p>
<p>&#8211; always have at least one A/A test open to ensure that the results are sound. That is, I test 2 times the same alternative and they come out with the same score.<br />
&#8211; rerun on a regular basis the tests which were not that significant.<br />
&#8211; keep long term stats to be sure that if my test gave an alternative which went from a 15% to a 20% conversion rate, it stays like that on the long run (or improve).</p>
<p>So I &#8220;fire and forget&#8221; a lot of tests, but I keep track of the effects on the long run and adjust. I accept that the numbers in the tests can be a bit off and always check against my real conversion rates at the end of the day. Note that the 10% deviation is that my framework (at the application level) picks up 10% more visitors, they can be bots (but they will not convert) or people without javascript enabled.</p>
<p>I hope it helps understanding my particular situation. The code of the A/B testing framework (PHP+MongoDB) is available (LGPL) here:<br />
<a href="http://projects.ceondo.com/p/pluf/source/tree/master/src/Pluf/AB.php" rel="nofollow">http://projects.ceondo.com/p/pluf/source/tree/master/src/Pluf/AB.php</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Matthew Brophy</title>
		<link>http://www.kalzumeus.com/2010/06/07/detecting-bots-in-javascrip/#comment-2691</link>
		<dc:creator><![CDATA[Matthew Brophy]]></dc:creator>
		<pubDate>Tue, 08 Jun 2010 10:15:58 +0000</pubDate>
		<guid isPermaLink="false">http://www.kalzumeus.com/?p=949#comment-2691</guid>
		<description><![CDATA[My site is not in the gambling industry and the vast majority of bots on my site report themselves as Internet Explorer (it is obvious that they are bots by their behaviour).

Loic d&#039;Anterroches - either you are lucky not to have these misreporting bots or because they misreport themselves you just haven&#039;t noticed.]]></description>
		<content:encoded><![CDATA[<p>My site is not in the gambling industry and the vast majority of bots on my site report themselves as Internet Explorer (it is obvious that they are bots by their behaviour).</p>
<p>Loic d&#8217;Anterroches &#8211; either you are lucky not to have these misreporting bots or because they misreport themselves you just haven&#8217;t noticed.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Loic d'Anterroches</title>
		<link>http://www.kalzumeus.com/2010/06/07/detecting-bots-in-javascrip/#comment-2690</link>
		<dc:creator><![CDATA[Loic d'Anterroches]]></dc:creator>
		<pubDate>Tue, 08 Jun 2010 09:17:40 +0000</pubDate>
		<guid isPermaLink="false">http://www.kalzumeus.com/?p=949#comment-2690</guid>
		<description><![CDATA[If your site is not in the gambling industry, you can use this way to check the user agent (PHP code):

    public static function isBot($user_agent)
    {
        static $bots = array(&#039;robot&#039;, &#039;checker&#039;, &#039;crawl&#039;, &#039;discovery&#039;,
                             &#039;hunter&#039;, &#039;scanner&#039;, &#039;spider&#039;, &#039;sucker&#039;, &#039;larbin&#039;,
                             &#039;slurp&#039;, &#039;libwww&#039;, &#039;lwp&#039;, &#039;yandex&#039;, &#039;netcraft&#039;,
                             &#039;wget&#039;, &#039;twiceler&#039;);
        static $pbots = array(&#039;/bot[\s_+:,\.\;\/\\\-]/i&#039;,
                              &#039;/[\s_+:,\.\;\/\\\-]bot/i&#039;);
        foreach ($bots as $r) {
            if (false !== stristr($user_agent, $r)) {
                return true;
            }
        }
        foreach ($pbots as $p) {
            if (preg_match($p, $user_agent)) {
                return true;
            }
        }
        if (false === strpos($user_agent, &#039;(&#039;)) {
            return true;
        }
        return false;
    }

It is not perfect, but it has been working very nicely for more than a year on my websites. Anyway, there are no perfect methods to detect if an agent is a bot... good enough is just the way to go.]]></description>
		<content:encoded><![CDATA[<p>If your site is not in the gambling industry, you can use this way to check the user agent (PHP code):</p>
<p>    public static function isBot($user_agent)<br />
    {<br />
        static $bots = array(&#8216;robot&#8217;, &#8216;checker&#8217;, &#8216;crawl&#8217;, &#8216;discovery&#8217;,<br />
                             &#8216;hunter&#8217;, &#8216;scanner&#8217;, &#8216;spider&#8217;, &#8216;sucker&#8217;, &#8216;larbin&#8217;,<br />
                             &#8216;slurp&#8217;, &#8216;libwww&#8217;, &#8216;lwp&#8217;, &#8216;yandex&#8217;, &#8216;netcraft&#8217;,<br />
                             &#8216;wget&#8217;, &#8216;twiceler&#8217;);<br />
        static $pbots = array(&#8216;/bot[\s_+:,\.\;\/\\\-]/i&#8217;,<br />
                              &#8216;/[\s_+:,\.\;\/\\\-]bot/i&#8217;);<br />
        foreach ($bots as $r) {<br />
            if (false !== stristr($user_agent, $r)) {<br />
                return true;<br />
            }<br />
        }<br />
        foreach ($pbots as $p) {<br />
            if (preg_match($p, $user_agent)) {<br />
                return true;<br />
            }<br />
        }<br />
        if (false === strpos($user_agent, &#8216;(&#8216;)) {<br />
            return true;<br />
        }<br />
        return false;<br />
    }</p>
<p>It is not perfect, but it has been working very nicely for more than a year on my websites. Anyway, there are no perfect methods to detect if an agent is a bot&#8230; good enough is just the way to go.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
