Message Board

Bugs & Development

Older Posts ]   [ Newer Posts ]
 ProjectHoneypot is disappointing - no moderation, no support
Author: A.Stone2   (30 Jul 18 6:00pm)
Seems here at Projecthoneypot is no moderation, so I might think, Project Honeypot is dead meanwhile. No support, no information, no feedback, no IPV6 support for the scripts, so spammers get only Error-Messeges, if they come with an IPV6 adress to the honeypot-site.

Meanwhile Projecthoneypot has become completely meaningless, so for me there is no other conclusion, to shut down all my honeypots. Every on-going honeypot is wasted time and unnecessary blocking of resources.

So goodbye, guys!
 
 Re: ProjectHoneypot is disappointing - no moderation, no support
Author: Quijotesca   (29 Mar 24 1:32pm)
This post is from 2018. I wish I'd read it before I tried to get back into this stuff.

Post Edited (29 Apr 24 2:54pm)
 
 Re: ProjectHoneypot is disappointing - no moderation, no support
Author: Dankiy   (10 Dec 24 2:11pm)
Sorry, I'm disappointed. I think a major revision of the whole system of data analysis is necessary.
But it's as they write in https://www.projecthoneypot.org/about_us.php :
"The data participants in Project Honey Pot will help to build the next generation of anti-spam software."
So here we're sitting at the dead end an just get harvested by PHP.


I've checked the API and wrote a shell script to get some results. But quite soon I've noted that the support for the classification for current search engines is not present:

- (some?) Google IPs are classified as Search Engine "Google"
- None of my bing hits in the Apache log had a classification as "Search Engine"
- Search engines like "AltaVista" and "Lycos" will never have a hit, but are still listed.
- ...

But the required data to identify them is present in the database:

You can find the identifying Agent strings aside of the IPs in the web interface. But the request via API will not decode as "Search Engine". Even if the IP has been classified in the web interface as "Search Engine" the result in the API is that of an unclassified IP. -- Example:

207.46.13.154 MSN
Spider Last Seen: 2024-11-13 (no outdated dataset)

Result at the website:
https://www.projecthoneypot.org/ip_207.46.13.154

Result of the API-Request:
---8<---
nslookup <your-API-key>.154.13.46.207.dnsbl.httpbl.org
[...]
** server can't find <your-API-key>.154.13.46.207.dnsbl.httpbl.org: NXDOMAIN
--->8---
=> "NXDOMAIN" has to be interpreted as No data in "http:BL", respectively "not scored yet".

Same happens to known "Harvesters", i.e. 14.126.30.54


Only on the very current contacts with "bad IPs" results appear (always??):
---8<---
nslookup <your-API-key>.229.128.106.182.dnsbl.httpbl.org
[...]
Non-authoritative answer:
Name: <your-API-key>.229.128.106.182.dnsbl.httpbl.org
Address: 127.1.58.3
--->8---
127.1.58.3 decodes to:
'Last seen: 1 day ago'; 'Threat score: 58'; 'Type: "Suspicious","Harvester"
BTW:
Harvester Last Seen: 2017-01-06
First Bad Host Appearance 2018-05-22
Last Bad Host Appearance 2024-12-09

But what's worth the result with "only current observations" when knowing (see last example IP) that they wait more than a year to reappear after harvesting a site?
What's worth having an obvious crawl through your site and the IP is not associated as "Search Engine" at API request, but the database obviously has that link?
What's worth IPv4 monitoring, when i.e. Facebook crawls with IPv6?

So I understand, why in most cases the latest update of the published tools for API usage are at about 1 decade in the past. — The data is not reliable and the query result cannot be optimized by the client side.





do not follow this link

Privacy Policy | Terms of Use | About Project Honey Pot | FAQ | Cloudflare Site Protection | Contact Us

Copyright © 2004–24, Unspam Technologies, Inc. All rights reserved.

contact | wiki | email