valiha: watercolor painting of my cat Lola (Default)
valiha ([personal profile] valiha) wrote in [community profile] ebooks2011-10-05 10:13 pm
Entry tags:

Another calibre question

HI, I have a calibre conversion question fandom people might understand better than MobileRead forumers.

I had the fanfiction downloader program called Graffer installed on my comp. I was happy with it because it would download fics from multiple sources and produced clean html files. Unfortunately it's creator gave up on the program and it is no longer updated, and it has stopped working for several major archives.

My work on converting my fics via calibre is slow, ad I often want to chuck my comp out the window, that's how frustrated calibre makes me feel. I was expecting import and conversion to be fairly straightforward, but calibre kept messing up the authors. I finally loked at the html coding and discovered that Graffer added a line in the metadata section which calibre would read as author name, but was actually the name of the programmer: <meta name='author' content='Grzegorz Hordynski' />

I wet to MobileRead to see if I can find a way to change this line in the html file automatically through bulk convert, but couldn't figure out the instructions or the regexes. I've been having a back and forth conversation with a member who doesn't understand what I'm after, so does anyone here know how to set up calibre so that it changes the programmer's name into the correct author name for selected ebooks in the actual html file, not just in the metadata?
rebecca2525: Abby Sciuto from NCIS with the word "geek" (Default)

[personal profile] rebecca2525 2011-10-06 10:39 am (UTC)(link)
Darn, posting html quote is messing up my formatting. Trying again...

If you don't find another solution to your problem and if you are willing to install Python, I'd write that script for you.

OTOH, I just looked at the bulk conversion thing of Calibre, and the Search & Replace looks exactly like what you need. Your regular expression would most likely just be the offensive tag as is if it stays exactly the same in all files:

<meta name='author' content='Grzegorz Hordynski' />

Leave the replacement string empty. If that doesn't work, try

<meta name='author' content='Grzegorz Hordynski' \/>

(With a backslash before the slash to tell calibre that you mean the character slash. The slash might have a special meaning in regular expressions which you don't want here.)
rebecca2525: Abby Sciuto from NCIS with the word "geek" (Default)

[personal profile] rebecca2525 2011-10-06 11:47 am (UTC)(link)
Yes, I meant the bulk import -> search and replace. But I've never bulk imported myself, so I can't really help you if it doesn't work.

Re the "Title by author - Author.epub" -- is maybe the title tag in the meta info of the html set to "Title by author"? You might want to try removing that with the bulk import search and replace, too. It's going to fallback on filename for title, then, I think. If you convert one author folder at a time, you can just edit the author back in via bulk editing of meta tags.
rebecca2525: Abby Sciuto from NCIS with the word "geek" (Default)

[personal profile] rebecca2525 2011-10-06 12:00 pm (UTC)(link)
Yes, that's what I meant: the title tag is set to "Darling by agelade" (right below the offending author name.) So Calibre, quite rightfully, assumes that that's the title.
rebecca2525: Abby Sciuto from NCIS with the word "geek" (Default)

[personal profile] rebecca2525 2011-10-06 12:05 pm (UTC)(link)
The regular expression to get rid of the whole title tag would be

<title>.*?</title>

herve leger

(Anonymous) 2011-12-06 08:05 am (UTC)(link)
Amazing write-up! This could aid plenty of people find out more about this particular issue. Are you keen to integrate video clips coupled with these? It would absolutely help out. Your conclusion was spot on and thanks to you; I probably won’t have to describe everything to my pals. I can simply direct them here!