Store wiki offline?

mess · August 23, 2006

Is there an easy way to store wiki pages offline?

this would be really helpfull as I don't always have internet access

when working on a MB project...

audiocommander · August 23, 2006

just look around for some programs; there are a bunch of them that will do that for you. but please be careful and read the instructions twice; programs like these can create an incredible amount of traffic if misused... and that's bad for the page-owner and (maybe) also for you.

Another option is to use a Pro-Version of Acrobat, it has also a site-fetching function.

Set the values to remain on the domain and restrict the depth level!

Or do it the old way: just store some html-pages that are relevant for you; I think most of the Wiki is still quite manageable ;)

Cheers,

Michael

mess · August 23, 2006

I tried some of those programs in the past, with varying results...

right now I use the save function of my browser but it's not the nicest solution

I was hoping that the wiki itself had a feature to export the source or something like that...

my primary goal is to archive my own wiki page because it's my only documentation at the moment

(except from the source code)

audiocommander · August 23, 2006

I was hoping that the wiki itself had a feature to export the source or something like that...

well, in technical terms, this is exactly what the wiki-server is doing all the time ...sending HTML-sources to you ;)

stryd_one · August 23, 2006

Sorry mess but I've looked into this and there's no easy way really... Most 'spider' type apps like you referred to (as you might already know) will not be able to collect the pages either :(

I was trying to make an midibox.org CD/DVD for those not able to get a fast internet connection but have basically abandoned the idea as it can't be automated....

audiocommander · August 23, 2006

hm, :-\

what's the issue with auto-downloaders (if restriced to 1 or 2 levels, html-docs only, staying on the domain?)

http://sourceforge.net/search/index.php?words=offline+grabber&sort=num_downloads&sortdir=desc&offset=0&type_of_search=soft&form_cat=18

Features of GetLeft:
- While it goes, it changes the original pages, all the links get changed to relative links, so that you can surf the site in your hard disk without those pesky absolute links.
- Limited Ftp support, it will download the files but not recursively.
- Resumes downloading if interrupted.
- Filters not to download certain kind of files.
- You can get a site map before downloading.
- Getleft can follow links to external sites.
- Multilingual support, at present Getleft supports Dutch, English, Esperanto, German, French, Italian, Polish, Korean, Portuguese, Russian, Turkish and Spanish.
- Some others not worth mentioning.

@stryd: if you want to provide an offline version, I could help (but not now, maybe if you pm me in one or two weeks?)

would be traffic friendly if we do such a thing just once and seed a zip or CD-Rom instead of everyone grabbing everything all the time...

TK. · August 23, 2006

I've already a script which allows me to generate an offline version of ucapps.de and midibox.org from the original version which is located on my laptop HD (the script automatically fixes the links)

But it doesn't work on the forum and Wiki

Best Regards, Thorsten.

stryd_one · August 24, 2006

Just FYI...

The problem is with dynamically generated PHP..... like this:

*/index.php?topic=7396.0;topicseen

This means that all of the forum pages are called index.php (or index.htm from the client side). A spider application that will save your files will name them differently every time as it will have to make up a random name to save the file as... This means that it would have to change all the links in all the posts to point to these names, and in order to be updateable without fully regenerating the site mirror, it would have to maintain a database of the changed links. None of the freely available spider apps support this operation, so every time the CD was made, we would hammer poor twin-x's servers by re-downloading the entire content.

The wiki can be exported from the web server by a dokuwiki add-on, but it will only run on a newer version of the PHP server than twin-x is running, so he would have to upgrade and risk stuffing up all his hosted sites.

Now of course we could get the source content and run a mirror using a PHP server on a DVD (so you would insert the DVD and it would run a web server on your local PC that would work just like the one you're reading now) but of course that would make the inner working of the forum, including the logins etc, available to all who had the DVD. I like you guys, but I'm not telling my password! ;)

I hate to be a stick in the mud but really there's no practical way of getting the forum or wiki working offline at present, without causing twin-x, "our hero" a great deal of hassle/money/downtime

Sorry guys, I tried, really I did!

audiocommander · August 24, 2006

so, folks, time to get serious:

no one tried the link of getLeft, right? :P tsss... ;)

so did I... but now I tried, because I'm a PHP-programmer too and I cannot imagine that all application developers are so dumb to ignore automatically generated PHP pages.

After installing GetLeft (which took me about 5.2 seconds) I am now getting a site-copy of the section "Application Development" from the Wiki, 2 levels deep, 1 level external documents, ignoring PDFs, AVIs, MOVs, ZIPs 8)

This fine little app even has an option "update", so you must not snatch everything again once it's updated...

I could rip it in depth next week and then seed a file, how is that?

Should PDFs be part of the offline version? ZIPs too?

Cheers,

Michael

stryd_one · August 24, 2006

I did try that one but had no luck, it kept naming the files strangely and I found a forum from others with the same problem... I guess is was "pilot error" :-[ Sorry!!!

Yeh PDF's and ZIPs would be good too :)

I will seed the thing too, I'm on a 1megabit upload so that should keep a few people happy ;)

Don't forget to throttle back the speed of your spider so it doesn't hammer the server. I suspect you've already done this but just to be safe :)

stryd_one · August 24, 2006

TK, would you be able to archive the ucapps.de site to add to the wiki? :)

Thanks guys!!

c0nsumer · August 24, 2006

How about moving the wiki to MediaWiki so it supports more formatting tags? There's also a script which comes with it which will allow one to easily dump the whole thing to static HTML.

Also, MediaWiki seems to run considerably faster than dokuwiki.

-Steve

Twin-X · August 24, 2006

How about moving the wiki to MediaWiki so it supports more formatting tags? There's also a script which comes with it which will allow one to easily dump the whole thing to static HTML.
Also, MediaWiki seems to run considerably faster than dokuwiki.
-Steve

It is far more difficult to use / create the user database for loggin in with mediawiki in combination with smf.

c0nsumer · August 24, 2006

It is far more difficult to use / create the user database for loggin in with mediawiki in combination with smf.

Ah, I could understand that then. What about allowing it to be a more traditional wiki where anyone can edit without an account? Vandalism doens't happen very often, and if it did, self-policing should take care of that. With the essentially eternal history, rolling back changes is trivial.

audiocommander · August 24, 2006

I did try that one but had no luck, it kept naming the files strangely and I found a forum from others with the same problem... I guess is was "pilot error"

hum... this is unlikely a pilot error: it might be that the program behaves differently on mac and win.

a typical filename is for ex: "doku.phpid=c_tips_and_tricks_for_pic_programming.html" ...should be okay for other platforms, shouldn't it?

cheers,

Michael

Twin-X · August 24, 2006

Ok i did something dirty.

url http://midibox.org/dokuwiki/offline

I does not get pretty but at least you have something!

audiocommander · August 24, 2006

thanks Twin-x :)

In the meanwhile I already grabbed the page with GetLeft and while checking the results I found it had left some errors within; whereas most of the links were formatted correctly, there were some second-level links, that GetLeft had forgotten to put the ".html" ending :(

If I knew how to use Grep with a NOT statement, one could correct this in 10 seconds...

Anyone knows how I could achieve this?

Find: [tt]href="doku.phpid=[something]"[/tt]

Replace with: [tt]href="doku.phpid=[something].html"[/tt]

Cheers,

Michael

Twin-X · January 7, 2007

I found a way to get the forum and dokuwiki offline it works great! I just tested it.

You need firfox as a internet browser since it is a plugin.

http://amb.vis.ne.jp/mozilla/scrapbook/

Jidis · January 7, 2007

Twin-x,

I'm on a beta team for something I run at the studio and don't have net access there. I really wanted to take the current postings (from a forum) with me each night, but no "offline" browsers would save the content of the forum due to some php crap or something (I'm a bit "web illiterate" ;)).

Do you think that thing would do it? I run Opera here, but it would be worth running FF just to do that. I've been having to save all of the relevant posts from there as separate HTML files and drag them with me each night.

Thanks for the tip!

George

BTW--- I run a thing called HTTrack (http://www.httrack.com) for saving most sites and it usually does well, even at the defaults. I keep a bunch of sites I need on CD.

Twin-X · January 7, 2007

Twin-x,
I'm on a beta team for something I run at the studio and don't have net access there. I really wanted to take the current postings (from a forum) with me each night, but no "offline" browsers would save the content of the forum due to some php crap or something (I'm a bit "web illiterate" ;)).
Do you think that thing would do it? I run Opera here, but it would be worth running FF just to do that. I've been having to save all of the relevant posts from there as separate HTML files and drag them with me each night.
Thanks for the tip!
George
BTW--- I run a thing called HTTrack (http://www.httrack.com) for saving most sites and it usually does well, even at the defaults. I keep a bunch of sites I need on CD.

htttrack does not work good on the wiki. at least not with me.

Jidis · January 7, 2007

Yeah, I'm wondering more about that other forum though. I tried a few offline browsers in addition to HTTrack, back before I gave up, and they all just saved the main entry page. Saving the current posts has gotten to be a pain, as they all have to be named and such, plus I don't know which ones I'll need to have with me each night.

George

Jidis · January 8, 2007

FWIW- That didn't appear to work for the beta forum. It saved a bunch of links and stuff, but would hang with a password entry box when I tried to open it with no net access. It also gave me a bunch of messages about not having saved parts of the site when it was done, so I don't think the content I was interested in even came in.

No big deal anyway. :-\

George

phunk · January 8, 2007

well its working with the acrobat pro version. I usually use that for archiving websites..its working (just tried it) but you need to rework it to achieve a proper result. The wiki is so complex that you probably need a few days...

Sign In

Store wiki offline?

Recommended Posts

mess

audiocommander

mess

audiocommander

stryd_one

audiocommander

TK.

stryd_one

audiocommander

stryd_one

stryd_one

c0nsumer

Twin-X

c0nsumer

audiocommander

Twin-X

audiocommander

Twin-X

Jidis

Twin-X

Jidis

Jidis

phunk

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity