You can help CodeWalrus stay online by donating here. | New CodeWalrus | Old (dark mode) | Old (light) | Discord server

Rule34 image downloader

Previous topic - Next topic

0 Members and 5 Guests are viewing this topic.

0
b/PC, Mac & Vintage Computers publicado por u/gameblabla January 21, 2018, 01:03:38 AM
So i met a guy on Discord and he made this image file scrapper in Python :
https://github.com/sunx2/r34py

Unfortunately, reading the source code was like trying to de-obfuscate it.
Plus, i had to use proxychains to make it work over Tor.

Hence why i decided as an exercise to re-implement it in C.
The main issues i had were to implement a function to crawl over the html files and find the image links.
This was the first time i did something like this so i took me a few more hours.
I eventually decided i should look for a pattern that would allow me to find them links fairly easily.

<a href="http:///....>


I eventually shortened the code so it only looks at  "<", ":" and if it is preceded by "r" two characters before the link.
Believe or not it works and i can now download all of my c goodness from the command line.

Then, i extended it so it downloads it from all the pages (just a loop really) and here are the results :
https://github.com/gameblabla/r34downloader_curl

It uses curl for downloading things. Once i work on the SDL gui interface, it could be ported to other platforms too.

Lemme now what you think about it. You must compile it from source.
Inicia sesión o crea una cuenta para dejar un comentario
u/Dream of Omnimaga January 23, 2018, 09:12:54 PM
Is the only content allowed rule 34/c material? :P
u/_iPhoenix_ January 24, 2018, 12:08:05 AM
It's really easy to do it manually, too. Most OS's provide an option to download a webpage and it's assets, and you can filter by image type.
u/gameblabla January 24, 2018, 10:26:06 PM
Quote from: xlibman on January 23, 2018, 09:12:54 PM
Is the only content allowed rule 34/c material? :P
Well the website itself only allows that so yeah. But this could be adapted to other websites but most don't post direct links
to the full pictures so yeah...

Quote from: _iPhoenix_ on January 24, 2018, 12:08:05 AM
It's really easy to do it manually, too. Most OS's provide an option to download a webpage and it's assets, and you can filter by image type.
Well yes, that is true but the point was to only download what i wanted, namely just the images themselves and not the thumbnails, ads or other random c. I doubt the OSes allow that very easily.
u/gameblabla January 25, 2018, 08:26:00 PM
Well, i tried it with some other tags like Mario and i realized it would not pick up some images properly.
So i fixed that and it works properly now.
u/gameblabla May 11, 2018, 08:26:45 PM
If i had an advice for aspiring developers, it's to not use fgets.
Just use fread instead, it can do the same things (together with fseek) and is much less buggier and more portable.
It's also more predictable and less confusing.

I was using fgets because i was under the impression that the formatting would not be the same if i used fread.
Of course i was wrong, it doesn't matter...

So yeah, i fixed that and a few small things (such as arrays not being cleared properly) and the program finally now works properly for multiple pages.
Of course, it still works well over tor. (i did not dare to do it over clearnet because i don't want the govt to know my norp habits...)
u/Dream of Omnimaga May 14, 2018, 03:34:58 PM
This will be handy for downloading abacus c.
Start a Discussion

b/PC, Mac & Vintage Computers

Computer programming discussion and project showcase

132
Topics
Explore Board
Website statistics


MyCalcs | Ticalc.org | Cemetech | Omnimaga | TI-Basic Developer | MaxCoderz | TI-Story | Casiocalc.org | Casiopeia | The Museum of HP Calculators | HPCalc.org | CnCalc.org | Music 2000 Community | TI Education | Casio Education | HP Calcs | NumWorks | SwissMicros | Sharp Calculators
Powered by EzPortal