converting html with \xa9 to Markdown and using iconv?

Jeremy C. Reed reed at reedmedia.net
Thu Mar 22 17:28:10 EDT 2007



> > © \xa9 (Copyright symbol)



> As far as I understand you, you are looking for a converter which supports

> UTF-8 / Unicode characters?


Maybe. But now that I think about it more I'd prefer some got converted to
the HTML entity, like ©

I found perl module HTML::Entities but I can't get its encode_entities()
to do what I want.

I may give your script a try, but I didn't have PHP on my workstation. (I
did have it on the system I am migrating the data too, so I guess I could
do all my work there instead.)

My original documents are in XML and have the é like entities, but
the generated HTML just has the single character which is breaking
html2text.py. I thought I could convert the characters back with perl with
perl -pe "s/([\x80-\xff])/'&#' . ord($1) . ';'/eg;" But that failed from
command line. It worked in a perl script though.

But then I see that html2text.py does convert the encoding to literal text
like "(C)" for ©. I don't want that either. I will have to try your
tool next. Thanks.

Jeremy C. Reed


More information about the Markdown-Discuss mailing list