valiha: watercolor painting of my cat Lola (Default)
valiha ([personal profile] valiha) wrote in [community profile] ebooks2011-10-05 10:13 pm
Entry tags:

Another calibre question

HI, I have a calibre conversion question fandom people might understand better than MobileRead forumers.

I had the fanfiction downloader program called Graffer installed on my comp. I was happy with it because it would download fics from multiple sources and produced clean html files. Unfortunately it's creator gave up on the program and it is no longer updated, and it has stopped working for several major archives.

My work on converting my fics via calibre is slow, ad I often want to chuck my comp out the window, that's how frustrated calibre makes me feel. I was expecting import and conversion to be fairly straightforward, but calibre kept messing up the authors. I finally loked at the html coding and discovered that Graffer added a line in the metadata section which calibre would read as author name, but was actually the name of the programmer: <meta name='author' content='Grzegorz Hordynski' />

I wet to MobileRead to see if I can find a way to change this line in the html file automatically through bulk convert, but couldn't figure out the instructions or the regexes. I've been having a back and forth conversation with a member who doesn't understand what I'm after, so does anyone here know how to set up calibre so that it changes the programmer's name into the correct author name for selected ebooks in the actual html file, not just in the metadata?

[personal profile] boundbooks 2011-10-05 08:38 pm (UTC)(link)
I can't help with the older files, but I have a suggestion which might be useful for future downloads! I just started using this fanfiction downloaded which is super awesome. It's called flag

http://www.flagfic.com/

It supports

fanfiction.net
fictionpress.com
twilighted.net
adifferentforest.com
thewriterscoffeeshop.com
twiwrite.net
ficwad.com
adultfanfiction.net
fictionalley.org
harrypotterfanfiction.com
mediaminer.org

And will soon support

tthfanfic.org
wraithbait.com
hpfandom.net
archive.skyehawke.com
archiveofourown.org

And allows downloads in
EPUB
MobiPocket
PDF
HTML

Edited 2011-10-05 20:39 (UTC)

[personal profile] boundbooks 2011-10-05 08:46 pm (UTC)(link)
I wish I totally had a solution, because I have a ton of HTML files too! I hope someone else knows, because it'd be handy information to have. :)

[personal profile] boundbooks 2011-10-05 08:50 pm (UTC)(link)
Yeah, I actually have no idea why he's building a download for AO3. Maybe he likes building downloads? That's my only guess!

FLAG / AO3 DOwnloader

(Anonymous) 2011-10-06 02:54 am (UTC)(link)
I'm building a downloader for AO3 because my users keep asking for one - I'm aware that AO3 already has its own, but either people are having trouble finding it, or they don't like it, I'm not sure which.

I don't claim to understand the strange desires of FLAG's users, but I do try to keep them happy :-).

Re: FLAG / AO3 DOwnloader

[personal profile] boundbooks 2011-10-06 03:15 am (UTC)(link)
Haha, okay! Honestly, I can understand users having trouble finding AO3's downloader. I took me about three months of using the site to realize that it had one. XD

"I don't claim to understand the strange desires of FLAG's users, but I do try to keep them happy :-)."

Sounds like a plan. A psychological approach would probably only yield further unanswerable questions. :)

Re: FLAG / AO3 DOwnloader

(Anonymous) 2011-10-06 04:56 am (UTC)(link)
"A psychological approach would probably only yield further unanswerable questions."

That was my thinking - happy users that confuse me are infinitely better than annoyed users who feel over-analysed!
musyc: Silver flute resting diagonally across sheet music (Default)

[personal profile] musyc 2011-10-05 08:52 pm (UTC)(link)
Try Squeebook for LJ. http://www.squeebook.net/ I haven't tested it out myself, but I hear excellent things about it for LJ/DW fanfics.

[personal profile] boundbooks 2011-10-05 09:04 pm (UTC)(link)
Ooh, that will work for me! I will try that. :D
aithine: (Garcia - smiling)

[personal profile] aithine 2011-10-05 09:16 pm (UTC)(link)
If you can point me to a place to download the program (or if you can get it to me), I can take a quick look and let you know if it's possible to fix that. Also, links to the MR discussion(s) would help shorten that process.
aithine: (Default)

[personal profile] aithine 2011-10-06 01:17 am (UTC)(link)
Ok, I think I actually misunderstood you originally, since I just skimmed the post. I thought you were going to continue to use Graffer, but wanted to fix it so it didn't put the program author's name in there. I've reread it, and what you're trying to do is fix the files well after the fact, using Calibre (you hope). Some questions for you:

1) Do all of the html files you're trying to convert have the author information in them somewhere? In the text, as part of the filename, somewhere?

2) Is the location of the author's name consistent for all of the files?

3) Is it the information actually marked as author, or is it just in the text somewhere, like the example you gave above?
Edited (clarification) 2011-10-06 01:19 (UTC)
aithine: (Default)

[personal profile] aithine 2011-10-06 10:23 pm (UTC)(link)
Just trying to clarify what you're trying to do. There are usually better tools that do just one job than expecting one program to do them. :) (I don't use calibre, so can't really comment on what it will or won't do well.)

To remove the programmer's name from the Graffer-grabbed files: grab a good text editor (like NoteTab) and use it to do a multi-file search and replace on the meta tag. That'll clear that out so you don't have to deal with it when you find a library program you do like to use and want to import your files.

To mass rename batch files without having to figure out the regexes, use AF5 Rename your files. It's a fairly logical interface for doing just one thing: renaming files using patterns. :)
aithine: (Default)

[personal profile] aithine 2011-10-08 06:20 am (UTC)(link)
I ♥ Keen Eddie (Obviously, since Eddie's been my default icon for eight years. *g*)

It's pretty easy to open multiple files at once in NoteTab and do the search and replace. Depending on how old or new your machine is, you can open files in bunches so it doesn't overwhelm computer's memory, but really, that's the fastest way to do that sort of thing without having to learn or deal with any of the scripting languages that would do what you need. And honestly, even though you have to open all the files, it doesn't take any longer. :) Plus, it sounds like it's much more your speed, and would cause the least amount of headache for you. :)

With the file renaming, sure, I know exactly what you mean. Download the program I gave you the link to (AF5 Rename your files) and read the help files--it easily does that sort of thing without making it complicated. (I've used that program for years--it's a very straightforward tool.)
Edited 2011-10-08 08:09 (UTC)
rebecca2525: Abby Sciuto from NCIS with the word "geek" (Default)

[personal profile] rebecca2525 2011-10-06 07:53 am (UTC)(link)
OK, this is about html files you have already downloaded, right? I don't know if there's a way to do this in Calibre, but it would only be a few lines of Python script or Shell script to go through all your files and remove that line.

What Operating System are you on? Dou you already have Python installed, by any chance?
rebecca2525: Abby Sciuto from NCIS with the word "geek" (Default)

[personal profile] rebecca2525 2011-10-06 10:39 am (UTC)(link)
Darn, posting html quote is messing up my formatting. Trying again...

If you don't find another solution to your problem and if you are willing to install Python, I'd write that script for you.

OTOH, I just looked at the bulk conversion thing of Calibre, and the Search & Replace looks exactly like what you need. Your regular expression would most likely just be the offensive tag as is if it stays exactly the same in all files:

<meta name='author' content='Grzegorz Hordynski' />

Leave the replacement string empty. If that doesn't work, try

<meta name='author' content='Grzegorz Hordynski' \/>

(With a backslash before the slash to tell calibre that you mean the character slash. The slash might have a special meaning in regular expressions which you don't want here.)
rebecca2525: Abby Sciuto from NCIS with the word "geek" (Default)

[personal profile] rebecca2525 2011-10-06 11:47 am (UTC)(link)
Yes, I meant the bulk import -> search and replace. But I've never bulk imported myself, so I can't really help you if it doesn't work.

Re the "Title by author - Author.epub" -- is maybe the title tag in the meta info of the html set to "Title by author"? You might want to try removing that with the bulk import search and replace, too. It's going to fallback on filename for title, then, I think. If you convert one author folder at a time, you can just edit the author back in via bulk editing of meta tags.
rebecca2525: Abby Sciuto from NCIS with the word "geek" (Default)

[personal profile] rebecca2525 2011-10-06 12:00 pm (UTC)(link)
Yes, that's what I meant: the title tag is set to "Darling by agelade" (right below the offending author name.) So Calibre, quite rightfully, assumes that that's the title.
rebecca2525: Abby Sciuto from NCIS with the word "geek" (Default)

[personal profile] rebecca2525 2011-10-06 12:05 pm (UTC)(link)
The regular expression to get rid of the whole title tag would be

<title>.*?</title>

herve leger

(Anonymous) 2011-12-06 08:05 am (UTC)(link)
Amazing write-up! This could aid plenty of people find out more about this particular issue. Are you keen to integrate video clips coupled with these? It would absolutely help out. Your conclusion was spot on and thanks to you; I probably won’t have to describe everything to my pals. I can simply direct them here!

(Anonymous) 2011-10-06 07:06 pm (UTC)(link)
Everybody sings its praises because it handles many formats and devices, is frequently updated to support newer conventions and formats and is free. I'm sorry you're having trouble with it, but a) your frustrations are clearly not universal, and b) I personally find it much easier to want to help people who are not prone to fits of panic and entitlement.