Has Google Broken JavaScript Spam Munging? 288
Baxil writes "For years now, Javascript munging has been a useful tool to share email addresses on the Web without exposing them to spammers. However, Google is now apparently evaluating Javascript when assembling summary text for web pages' listings, and publishing the un-munged email addresses to the world; and spammers have started to take advantage of this kind service." Anyone else seen this affecting their carefully protected email addresses?
Re:Mung (Score:5, Informative)
Re:Really.... (Score:2, Informative)
It's TRIVIAL for a spambot to execute code like this sitting in script tags in the "js" binary and dumping the contents, and then grabbing emails with a regex.
I use the "js" binary to rip porn off sites all the time.
~$ js -v
JavaScript-C 1.7.0 2007-10-03
usage: js [-PswWxCi] [-b branchlimit] [-c stackchunksize] [-v version] [-f scriptfile] [-e script] [-S maxstacksize] [scriptfile] [scriptarg...]
one answer (Score:2, Informative)
Contact Me Form (Score:5, Informative)
A better method is to have a Contact Me form that doesn't display your e-mail address anywhere on it. Yes, you'll get spammers filling it out, but you can cut down on those with some simple techniques. For example, make a "Phone Number" field and set the CSS display attribute to none. Normal users won't see this field and won't fill it out. Spam-bots will see it and attempt to fill it out. Then, have your submission script silently fail to send to e-mail if the "Phone Number" is filled out. (If you toss an error, the spammer might figure out the trick.) No method is fool-proof, of course, but this is much better than putting your e-mail address on your webpage and hoping that someone doesn't de-mung it.
Re:*rolleyes* (Score:4, Informative)
Most spambots have been proven, in several experiments, to not even parse hex/decimal HTML character entities, so JavaScript parsing was considered to be mostly safe for the moment. It's not like people assume this is a perfect spam-blocking method - just that it's good enough to not get thousands upon thousands of spam, limiting it to a reasonable number.
Re:*rolleyes* (Score:3, Informative)
Recaptcha [recaptcha.net] has a service specifically for email addresses, no obfuscation needed... Which also has the added benefit of aiding book digitizing!
Re:It's not google, it's the web developers (Score:3, Informative)
Re:robots.txt (Score:2, Informative)
On Google appliances, there is actually a googleon / googleoff [google.com] set of comment tags you can use.
Re:Mung (Score:3, Informative)
From Jargon File (4.4.4, 14 Aug 2003) [jargon]:
mung /muhng/, vt.
[in 1960 at MIT, "Mash Until No Good"; sometime after that the
derivation from the {recursive acronym} "Mung Until No Good" became
standard; but see {munge}]
1. To make changes to a file, esp. large-scale and irrevocable
changes. See {BLT}.
2. To destroy, usually accidentally, occasionally maliciously. The /muhnj/ is now usual in
system only mungs things maliciously; this is a consequence of
{Finagle's Law}. See {scribble}, {mangle}, {trash}, {nuke}. Reports
from {Usenet} suggest that the pronunciation
speech, but the spelling `mung' is still common in program comments
(compare the widespread confusion over the proper spelling of
{kluge}).
3. In the wake of the {spam} epidemics of the 1990s, mung is now
commonly used to describe the act of modifying an email address in a
sig block in a way that human beings can readily reverse but that will
fool an {address harvester}. Example: johnNOSPAMsmith@isp.net.
4. The kind of beans the sprouts of which are used in Chinese food.
(That's their real name! Mung beans! Really!)
Like many early hacker terms, this one seems to have originated at
{TMRC}; it was already in use there in 1958. Peter Samson (compiler of
the original TMRC lexicon) thinks it may originally have been
onomatopoeic for the sound of a relay spring (contact) being twanged.
However, it is known that during the World Wars, `mung' was U.S.: army
slang for the ersatz creamed chipped beef better known as `SOS', and
it seems quite likely that the word in fact goes back to Scots-dialect
{munge}.
Charles Mackay's 1874 book Lost Beauties of the English Language
defined "mung" as follows: "Preterite of ming, to ming or mingle; when
the substantive meaning of mingled food of bread, potatoes, etc.
thrown to poultry. In America, `mung news' is a common expression
applied to false news, but probably having its derivation from mingled
(or mung) news, in which the true and the false are so mixed up
together that it is impossible to distinguish one from another."
See the third definition.
Re:Mung (Score:5, Informative)
Nice try, but that rule only applies to "[^ng]g$" words.
but it doesn't apply "[n]g$", because the n modifies the sound of the g, and gg$ is uncommon enough that it's an exception in itself.
Unfortuantely we don't have many examples of "ung$" because most of the words of that form are either nouns (e.g. dung, lung, young) or past participles (e.g. clung, hung, sung), so their present participles are generally formed from the present tense "ing$" form of word (e.g. cling/clung/clinging, hang/hung/hanging, sing/sung/singing), etc.
Note that we do have plenty of examples of "unge$" forming "unging$":
So that's plenty of reason to believe that the rule is "unge + ing = unging", despite the fact that "inge + ing" can be either "inging" or "ingeing" depending on the word (and in some cases both are valid):
Therefore I strongly contend that:
You may dispute the claim above, but there's no disputing:
:)
Re:Much ado about nothing (Score:3, Informative)
Something that most people don't understand is that spam is NOT universal. Every e-mail address is unique, and will get a different assortment of spam. Some of the users on my mail server get spam that I don't get, and I get spam that they don't get.
In particular, a new e-mail address will never get spam, unless:
That's pretty much it. #1 is only likely if your username is common (like just your first name). #3 isn't a common problem anymore, since most sites either don't post their users' e-mail addresses, or they obfuscate them (like Slashdot does). #5 isn't a common problem either. I've only gotten burned by #6 a few times.