It’s fairly common to have scenarios where you are trying to push text data into a brittle system that can not handle text data beyond very simple ASCII.
You can try all kinds of approaches, but one fairly simple one is to create a function such as the one below and then you can easily reuse that function across various queries and columns. This is a nice application of DRY to MySQL.
CREATE DEFINER=`YOUR_DB_USER`@`%` FUNCTION `YOUR_DB_NAME`.`cleanString`( str longtext ) RETURNS longtext CHARSET latin1 DETERMINISTIC BEGIN DECLARE startTag, endTag INT DEFAULT 1; DECLARE i, len SMALLINT DEFAULT 1; DECLARE ret longtext DEFAULT ''; DECLARE c text; IF str IS NULL THEN RETURN ret; END IF; IF str = '' THEN RETURN ret; END IF; -- strip HTML tags stripTags: LOOP SET startTag = LOCATE("<", str, startTag); IF (!startTag) THEN LEAVE stripTags; END IF; SET endTag = LOCATE(">", str, startTag); IF (!endTag) THEN SET endTag = startTag; END IF; SET str = INSERT(str, startTag, endTag - startTag + 1, ""); END LOOP; SET len = CHAR_LENGTH(str); -- now remove any non ascii characaters REPEAT BEGIN SET c = MID(str, i, 1); IF c REGEXP '[ -~]' THEN SET ret = CONCAT(ret, c); END IF; SET i = i + 1; END; UNTIL i > len END REPEAT; RETURN ret; END
The function will take basically any text you can throw at it thanks to using the longtext type. It will use a little bit of XML magic to remove any HTML tags and then it will break the string into characters and loop over them, skipping out any that don’t match characters between space and tilde. This removes line breaks and all kinds of special characters.
Note this is a hugely destructive approach and not something you should do if you are pushing the data to somewhere valuable and important where the data must be very accurate. It is a good quick fix though where you just need to push some text in and can accept a little bit of mangling as long as it just goes in there without any more fuss.