CCN Forum Migration

From WorkCDN

Jump to: navigation, search

As we're thinking of migrating forum software from the old ChristianCoders.com forum software, one thing that we really want to keep is the old threads of conversation and the users and whatnot.

However, Infopop UBB isn't based on SQL, and so we'll have a hard time integrating the Wiki and forum software if we continue to use that. In addition, the version of UBB that we're using over there is a bit old and buggy -- we're ready for something a little more maintained.

Mack's been working on a new forum site, so we're not trying to step on toes or circumvent his efforts -- we're sortof just curious as to what it would take to integrate a forum with this Wiki while maintaining all of the old posts.

So here's where we're looking at how to extract data from the old forum.

Contents

User information

Thankfully, there's a very nice member list page. The URL is simple enough to follow and parse through with our program. Let's walk through it, shall we?

URL Structure

Here's a sample URL

 http://www.christiancoders.com/cgi-bin/ubb-cgi/memberlist.cgi?page=1&sortby=&letter=A&s=&svalue=

This can be shortened to:

 http://www.christiancoders.com/cgi-bin/ubb-cgi/memberlist.cgi?letter=A&page=1

and it still works just great.

The prefix is constant:

 http://www.christiancoders.com/cgi-bin/ubb-cgi/memberlist.cgi?

Then comes the letter (look from A to Z, then include @):

 ?letter=A

Then comes the page number.

 ?page=1

The script should loop and increment the page number until it finds a page with no records, then we know it's time to move on to the next letter and reset our page counter.

There are a lot of spam accounts on there -- let's make things easy by having our script discard all users with 0 posts.

Groovy. Member list managed.

Thread Information

Note that it's one thing to recover the thread subject, and another to recover the posts.

Forum / Thread Title

The forum and thread title can either be gotten from the thread itself

 http://www.christiancoders.com/cgi-bin/ubb-cgi/postdisplay.cgi?forum=Forum4&topic=000566

Or it can possibly be extracted en-masse from the forum discussion pages:

 http://www.christiancoders.com/cgi-bin/ubb-cgi/forumdisplay.cgi?action=topics&number=4&DaysPrune=1000&startpoint=0

I tend to think that the second option will be quicker and easier. Iterate through that list by incrementing the &startpoint variable by 75 for each page.

Posts

Loop through the edit boxes to recover the original UBB code.

 http://www.christiancoders.com/cgi-bin/ubb-cgi/postings.cgi?action=editpost&number=4&topic=000566.cgi&ReplyNum=000026

URL Format

Prefix:

 http://www.christiancoders.com/cgi-bin/ubb-cgi/postings.cgi?action=editpost

Number (I think this is the forum number)

 &number=4

Topic (This is the number of the thread)

 &topic=000566.cgi

I'm not sure why it has the .cgi at the end, but it's necessary (if you take it off, it says it logs it as a hack attempt).

Reply Number -- this is the number of the reply in the thread:

 &ReplyNum=000026

Increment this number until you get a blank text box, which indicates the end of time.

Data Extraction

The username and date is in the string above the text box. In our example, it is:

 Originally posted by samw3 on 11-12-2007 04:45 PM

The post information is in the textarea field -- that should be fairly easy to parse out.

Personal tools