Daily Rotation Search and Hot Topics Progress

Posted: 02/04/10, 9:53 pm
by bob
OK, this time I have it. I've concocted a script that removes duplicates from the database. Not implemented yet, but thoroughly tested. This one works -- solves the problem of flat out dupes, and dupes from the same story being posted multiple days. Also solves the problem of sites making slight changes to the title for the same story throughout the day... but that's a little bit iffier, depends on what they change and the nature of their links.

This should do wonders for the Hot Topics scripting too, since duplicates won't be counted as two hits anymore, but I need to work with that a little more before I know whether I can fully automate the process or not.

Should be fully implemented by Monday.

Posted: 02/08/10, 4:12 pm
by bob
OK, think I have it. Tell me if you see any repetition. Some hincky points not visible to users... Thinking them through to clear them up.

Also, as predicted, Hot Topics is playing nicer -- see this:

Still needs work. I might go with a one word model instead of word pairs -- might be a lot of day-to-day repetition though. For example, microsoft, apple and google would place high in the scores daily... Once again, have to think it through.

Posted: 02/18/10, 12:25 am
by bob
Hot Topics - not sure I can make it automatic at all. No matter how complicated and screwey a scheme I come up with for automatically discovering hot topice, several of the so-called topics are nonsense.

OK, still, maybe I'll go to hand picking them. I might not need to change them very often... I've set up a beta attempt on my test page, here, right under the top10 headlines. See what you think:

My idea is that the first row will be hand picked and change every day or so... Or at least new topics added and removed, kind of daily.... The second row is pretty permanent, since it contains topics of pretty much ongoing interest.

Or maybe I can just come up with a semi-permanent set of topics.

Any suggestions?

Posted: 02/28/10, 9:36 pm
by bob
>>>>Or maybe I can just come up with a semi-permanent set of topics.

Our "All The Latest" feature's really getting a lot of use. Over the last couple hours alone, 60 hits. Any ideas for topics to add to it? Currently it has: Apple, Google, Microsoft, Linux, Windows, Security, Smartphones, Open Source, Games

Posted: 03/01/10, 2:13 am
by Gerry

Posted: 03/01/10, 5:36 am
by bob
I can do that --

There must be loads of more general terms that would result in turning up lots of headlines lots of people are interested in...?

Posted: 03/01/10, 5:44 am
by Gerry
currently the searches are returning nothing

Posted: 03/01/10, 6:30 am
by bob
I'm working on it--date problem, since it's 3-1-10 here, Perl's having a hard time figuring day -1, -2, -3 etc.... "All the Latest" is supposed to go back 4 days, including today. Getting late here though, and I'm getting tired.... I'll set up something temporary and work on the problem tomorrow.

Posted: 03/01/10, 10:45 am
by Gerry
No problems mate. Not an issue for me, just letting you know.

If you wanted to stop issues like that it's pretty simple to set up a server on your own machine and then test the code locally before uploading it to the live server. Again just FYI in case it's what you want.

Posted: 03/01/10, 10:22 pm
by bob
OK, got it fixed -- my original method was too simplistic, just subtracting from the current date....

This seems to work:

Code: Select all

my $timeYesterday = time - 24 * 60 * 60;
my ($year, $month, $day) = (localtime($timeYesterday))[5,4,3];
my $yd = sprintf ("%02d-%02d-%04d", $month+1, $day, $year+1900);

my $timeYesterday1 = time - 48 * 60 * 60;
my ($year, $month, $day) = (localtime($timeYesterday1))[5,4,3];
my $yd1 = sprintf ("%02d-%02d-%04d", $month+1, $day, $year+1900);

my $timeYesterday2 = time - 72 * 60 * 60;
my ($year, $month, $day) = (localtime($timeYesterday2))[5,4,3];
my $yd2 = sprintf ("%02d-%02d-%04d", $month+1, $day, $year+1900);

probably can be compressed a bit....

Thanks for the advice, Gerry. Actually I've got a notebook here I'm not using. Maybe I'll set it up to match my server, just for a test box.

Posted: 03/01/10, 10:32 pm
by bob
BTW, Script above is thanks to

Thanks to ... localtime/

Posted: 04/04/10, 12:55 am
by bob
Our "All the Latest" feature has gotten more than 7000 hits since we created it. I'd say that's worthwhile.

Any ideas for more terms to add to it? Currently it has: Apple, Google, Microsoft, Linux, Windows, Security, Smartphones, Open Source, Games, and Ubuntu. I think I might add "Censorship." Any others?

Posted: 04/04/10, 11:09 am
by Gerry
Oh nice idea! I'd click that one. :)

Posted: 04/14/10, 1:14 am
by bob
Censorship's not working out. I need to rethink that one.... Taking it down. Click on it and you'll see why.... My search engine doesn't lend itself to boolian searches and the word "censor" doesn't come near to covering all possibilities. ... rds=CENSOR