Welcome to Xierpa. This is the stable 1.2 version which was developed by Petr van Blokland + Claudia Mens (buro@petr.com) and is maintained by Michiel Kauw-A-Tjoe. It is subclassed by the Museum Meermanno and American Express applications.

Unicode and UTF-8

Problems with Unicode and UTF-8 in Xierpa

During the development of Xierpa, the use of unicode and UTF-8 strings was mixed inside the application.
In practice this was the cause for a lot of errors and confusion. Python does not allow combining basestring and UTF-8 strings. Instead all merging strings have to be converted using unicode(utf8string, 'utf-8') (from UTF-8 to unicode) of unicodestring.encode('utf-8') (from unicode to UTF-8).
The strings must have the right format or otherwise an error is raised.

To avoid all this, Xierpa needs to be UTF-8 clean, all internal strings must be unicode. Only at periphiral connections UTF-8 can be encoded for use outside the webserver, such as XHTML code, database connection, files and form input. Xierpa will do all (most) of the periphiralconversions automatic.
Warning: The original transformer tool unicodify does no longer exist.

Aspects to notify

  • Python sources and XML documents should be of type “Unix, UTF-8, No BOM”.
  • In Python source strings can be either "abc" or u"abc" if they only contain low-level ASCII characters. With any usage of other unicode characters the Python string must be of type unicode as in u"København".
  • All conversions to and from a database flow through the tools/transformer functions sql2unicode(value) and s2sql(value). Not that thesed functions don’t perform symmetric behaviour. The reason is that reading a selection from the datebase generates a set of records, while writing is always based on a (UTF-8) query string.
    function unicode other
    sql2unicode(v) Convert v to a unicode string if it is an instance of basestring. Replace all double single quotes ('') by a single quote. This is done to prevent any hacking of queries.
    s2sql(v) Don’t convert v to a UTF-8 string, this is done with the total query by self.agent. If the value is not an instance of basestring then convert it to a string first.
    Then .
    Replace all single quotes by a double single quote ('') . This is done to prevent any hacking of queries.
    There is a current bug still in the Record reading for related field values.
  • The XsltParser.xml() method only takes unicode strings. An error will be raised if the xml attribute has a different type.
  • The XmlParser.parse() method only takes unicode strings. An error will be raised if the xml attribute has a different type.
  • The periphiral communication methods Agent.query(q) and Agent.getquery(q) convert the unicode query q attribute to UTF-8 before calling the database driver.
  • The Mailer.mailto instance (as called from XierpaBuilder.mailto) requires the subject and message attributes both be of type unicode. Otherwise an error is raised.