unsigned long mysql_real_escape_string(MYSQL *mysql,
char *to, const char *from, unsigned long length)
Note that mysql
must be a valid, open
connection. This is needed because the escaping depends on the
character set in use by the server.
Description
This function is used to create a legal SQL string that you can use in an SQL statement. See Section 8.1.1, “Strings”.
The string in from
is encoded to an escaped
SQL string, taking into account the current character set of the
connection. The result is placed in to
and a
terminating null byte is appended. Characters encoded are
NUL
(ASCII 0),
“\n
”,
“\r
”,
“\
”,
“'
”,
“"
”, and Control-Z (see
Section 8.1, “Literal Values”). (Strictly speaking, MySQL requires
only that backslash and the quote character used to quote the
string in the query be escaped. This function quotes the other
characters to make them easier to read in log files.)
The string pointed to by from
must be
length
bytes long. You must allocate the
to
buffer to be at least
length*2+1
bytes long. (In the worst case,
each character may need to be encoded as using two bytes, and
you need room for the terminating null byte.) When
mysql_real_escape_string()
returns, the contents of to
is a
null-terminated string. The return value is the length of the
encoded string, not including the terminating null character.
If you need to change the character set of the connection, you
should use the
mysql_set_character_set()
function rather than executing a SET NAMES
(or SET CHARACTER SET
) statement.
mysql_set_character_set()
works
like SET NAMES
but also affects the character
set used by
mysql_real_escape_string()
,
which SET NAMES
does not.
Example
char query[1000],*end; end = strmov(query,"INSERT INTO test_table values("); *end++ = '\''; end += mysql_real_escape_string(&mysql, end,"What is this",12); *end++ = '\''; *end++ = ','; *end++ = '\''; end += mysql_real_escape_string(&mysql, end,"binary data: \0\r\n",16); *end++ = '\''; *end++ = ')'; if (mysql_real_query(&mysql,query,(unsigned int) (end - query))) { fprintf(stderr, "Failed to insert row, Error: %s\n", mysql_error(&mysql)); }
The strmov()
function used in the example is
included in the mysqlclient
library and works
like strcpy()
but returns a pointer to the
terminating null of the first parameter.
Return Values
The length of the value placed into to
, not
including the terminating null character.
Errors
None.
User Comments
Documentation is unclear (at least to me):
> taking into account the current character
> set of the connection
What does this mean? How is mysql_real_escape_string affected by the character set?
To test the feature, I run the mysql server with default-character-set=usa7
Then I used mysql_real_escape_string on a string containing german special characters (äöüß), which should be illegal for the usa7 charset; nothing happened, i.e. mysql_real_escape_string neither removed nor changed these characters.
Thus, I experienced no change compared to mysql_escape_string.
This doesn't work when using the VC++ APIs for Win32 (and maybe also not in the BCB APIs), you get an unresolved external, even with all libs included.
Here is a solution that does the same. Buffer1 is your binary data, Buffer 2 is the data you put into the query;
char Buffer1[100]
char Buffer2[201]
for(int x = 0; x < 100; x++)
{
switch(Buffer[x])
{
case '\0':
Picture += "\\0";
break;
case '\n':
Picture += "\\n";
break;
case '\r':
Picture += "\\r";
break;
case '\'':
Picture += "\\'";
break;
case '"':
Picture += "\\\"";
break;
case '\\':
Picture += "\\\\";
break;
default:
Picture += Buffer[y];
break;
}
Errata: of course the 'Picture' should be 'Buffer2', sorry for the inconvenience.
Please add a note on how to scape the % character in strings, so that it can be later used in LIKE expressions.
It would be most useful to add a function that provides the programmer with the length returned by mysql_real_escape_string() because the *to buffer usually needs to be allocated a particular size before this function is called. If that size is unknown before the function is called a paradox exists. Simply allocating twice the memory for the *to buffer with respect to the *from buffer is unsatisfactory.
Watch out! If you insert data into a binary char using this, when you try to retrieve it MySQL will strip off the trailing spaces! Very confusing! Be sure to read section 6.2, Column Types!
I use a method like this for escaping strings. Pass in 0 for bufSize and returns size without having allocate double mem before hand. There are optimizations for this. But you can get the idea.
int MySqlConn::escapeString(char *szOrig, char *szEscaped, int bufSize)
{
int newLen = 0;
int origLen = strlen(szOrig);
for(int i = 0; i < origLen; i++)
{
switch(szOrig[i])
{
case '\0':
if (bufSize > newLen + 2)
memcpy(&szEscaped[newLen], "\\0", 2);
newLen += 2;
break;
case '\b':
if (bufSize > newLen + 2)
memcpy(&szEscaped[newLen], "\\b", 2);
newLen += 2;
break;
case '\n':
if (bufSize > newLen + 2)
memcpy(&szEscaped[newLen], "\\n", 2);
newLen += 2;
break;
case '\r':
if (bufSize > newLen + 2)
memcpy(&szEscaped[newLen], "\\r", 2);
newLen += 2;
break;
case '\t':
if (bufSize > newLen + 2)
memcpy(&szEscaped[newLen], "\\t", 2);
newLen += 2;
break;
case '\z':
if (bufSize > newLen + 2)
memcpy(&szEscaped[newLen], "\\z", 2);
newLen += 2;
break;
case '\%':
if (bufSize > newLen + 2)
memcpy(&szEscaped[newLen], "\\%", 2);
newLen += 2;
break;
case '\_':
if (bufSize > newLen + 2)
memcpy(&szEscaped[newLen], "\\_", 2);
newLen += 2;
break;
case '\'':
if (bufSize > newLen + 2)
memcpy(&szEscaped[newLen], "\\'", 2);
newLen += 2;
break;
case '\"':
if (bufSize > newLen + 2)
memcpy(&szEscaped[newLen], "\\\"", 2);
newLen += 2;
break;
case '\\':
if (bufSize > newLen + 2)
memcpy(&szEscaped[newLen], "\\\\", 2);
newLen += 2;
break;
default:
if (bufSize > newLen + 1)
szEscaped[newLen] = szOrig[i];
newLen++;
break;
}
}
return newLen;
}
mysql_real_escape_string generates an unresolved external on the VC++ compiler (tested with VC++ 7.1) even with all libs included.
Use mysql_escape_string instead (or if you use an incompatible charset, roll your own).
Hopefully this will be fixed soon.
Warning, sample code in french translation is wrong!
Attention, l'exemple en français est erroné !
Le 4ème paramètre dans les deux appels à la fonction
mysql_real_escape_string() est incorrecte. La taille indiquée ne correspond pas à la chaîne à coder.
La chaîne de caractères "C'est quoi ça" ne fait pas 11 caractères mais 13, et l'autre chaîne "donnée binaire : \0\r\n" fait 20 caractères et non 16.
Je vous laisse deviner l'origine de ces deux erreurs :-)...
using mysql_real_escape_string() seems pretty clumsy. I thought... why not make it more like snprintf()? so I made a wrapper that's more like snprintf(). So far (after VERY little testing), it seems to work pretty well. (use at your own risk. please consider this GPL'd code.) It would be great if this were part of the mysql client lib.
basics:
int mysql_escapestring_sprintf(MYSQL * mysql, char * buf, size_t size, const char * fmt, ...)
- mysql is a database handle
- buf is preallocated buffer size characters long.
- fmt is a format string similar to those used in printf() functions
- ... is a variable length list of strings, integers, floats and characters
fmt can contain only %s, %d, %f, and %c format specifiers. The %s string format specifier is the only one that can be modifed with a length parameter. example: %8s means the associated string is exactly 8 bytes long. the length specifier is used in mysql_real_escape_string's len parameter, and the associated string is passed as the from parameter. this makes it possible to pass binary data.
The function returns the length of the created string. if the created string is truncated because of insufficient space in the buff parameter then the returned value is the length the string would have been. Test for that condition by testing if the returned value is greater than the size parameter. Under some conditions the function will call malloc(). If malloc fails this function will return -1 and errno should be set appropriately. Invalid format strings are more or less ignored and may result in a badly formed query string in buff.
the function is likely very platform dependent. it was developed on an up-to-date Debian 3.1 system.
example:
<pre>
int newlen = 0;
char escapedbuf[1024];
newlen = mysql_escapestring_sprintf(mysql, escapedbuf, sizeof(escapedbuf),
"select n from non where m = %d and name = '%s'", 4, "Penny" );
if(newlen < 0)
printf("error: %m\n");
else if(newlen > sizeof(escapedbuf))
printf("query too long\n");
else
printf("new query string: %s\n", escapedbuf);
</pre>
<pre>
/* this is licensed under the GPL. Ben Johnson is the author who borrowed heavily from man pages. */
/* because some limit is required, string
* must be less than 9999 characters long. */
#define MAX_STRMODIFIER_LEN 4
static int
snprintf_mysql_escaped_sql_statement(MYSQL * mysql, char * buf, size_t size, const char * fmt, ...)
{
int added_length = 0;
int offset_from_buf = 0;
int oversized = 0;
va_list ap;
int d;
double f;
char c, *s;
char current_char;
int expecting = 0;
char strmodifier[MAX_STRMODIFIER_LEN+1];
int strmodifier_offset = 0;
int keep_expecting = 0;
int isoverflowed = 0;
strmodifier[0] = '\0';
va_start(ap, fmt);
while ((current_char = *fmt++))
{
added_length = 0; /* this might not change */
if(! expecting)
{
switch(current_char)
{
case '%': /* start expecting */
expecting = 1;
break;
default:
if( offset_from_buf < size-2 )
{
buf[offset_from_buf] = current_char;
buf[offset_from_buf+1] = '\0'; /* null terminate every change */
}
else
isoverflowed = 1;
added_length = 1;
break;
}
}
else
{
keep_expecting = 0;
switch(current_char)
{
case '%': /* literal '%' character */
if( offset_from_buf < size-2 )
{
buf[offset_from_buf] = current_char;
buf[offset_from_buf+1] = '\0'; /* null terminate every change */
}
else
isoverflowed = 1;
added_length = 1;
break;
/* string length modifiers */
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9':
if( strmodifier_offset < MAX_STRMODIFIER_LEN )
{
strmodifier[strmodifier_offset] = current_char;
strmodifier[strmodifier_offset+1] = '\0'; /* null term every change */
strmodifier_offset++;
}
keep_expecting = 1;
break;
case 's': /* string */
{
int len = 0;
char * p = NULL; /* for quick malloc */
s = va_arg(ap, char *);
/* need string length, one way or another */
if( *strmodifier )
len = atoi(strmodifier);
else
len = strlen(s);
/* if remaining space is sufficient, use buf
* as mysql_real_escape_string()'s "to" arg */
if( size - offset_from_buf > len*2+1 )
added_length = mysql_real_escape_string(mysql, &buf[offset_from_buf], s, len);
else
{
p = malloc(len*2+1);
if( p )
{
added_length = mysql_real_escape_string(mysql, p, s, len);
if( offset_from_buf+added_length < size )
memcpy(&buf[offset_from_buf], p, added_length+1); /* +1 is null */
else
isoverflowed = 1;
free(p);
}
else
{
oversized = -1;
goto error_out;
}
}
}
break;
case 'd': /* int */
d = va_arg(ap, int);
added_length = snprintf(&buf[offset_from_buf], (size-offset_from_buf)+1, "%d", d);
if( added_length > (size-offset_from_buf)+1 )
isoverflowed = 1;
break;
case 'c': /* char */
/* need a cast here since va_arg only
takes fully promoted types */
c = (char) va_arg(ap, int);
added_length = snprintf(&buf[offset_from_buf], (size-offset_from_buf)+1, "%c", c);
if( added_length > (size-offset_from_buf)+1 )
isoverflowed = 1;
break;
case 'f': /* float/double */
f = va_arg(ap, double); /* promote floats */
added_length = snprintf(&buf[offset_from_buf], (size-offset_from_buf)+1, "%f", f);
if( added_length > (size-offset_from_buf)+1 )
isoverflowed = 1;
break;
default:
/* invalid format. maybe do an error here.
printf("%s[%d]: default? not adding: %c\n", __FUNCTION__, offset_from_buf, current_char);
*/
break;
}
if( ! keep_expecting )
{
strmodifier[0] = '\0';
strmodifier_offset = 0;
expecting = 0;
}
}
oversized += added_length;
if( added_length <= size - offset_from_buf )
offset_from_buf += added_length;
}
#if 0
if( isoverflowed )
memset(buf, 0, size);
#endif
error_out:
va_end(ap);
return oversized;
}
</pre>
I couldn't find a method for unescaping a string i.e. converting '\'', '\"' etc in a string back to ''', '"' etc. I've written the following function that can do this. It does the job by performing a single pass over the data to be unescaped.
enum STATE { NORMAL, ESCAPE } state;
// unescape: in-memory unescapes the string passed in as argument
// return: length of unescaped string
int unescape (char *str, int strLength) {
int sIndex = 0, dIndex = 0;
if (str == 0 || strLength <= 0) // validate string
return 0;
state = NORMAL;
while (sIndex < strLength) {
switch (state) {
// we are in the NORMAL state until we find a '\\' character
case NORMAL:
if (str [sIndex] == '\\') {
state = ESCAPE;
}
else {
// simply copy the non '\\' characters from
// the source to the destination
// in the NORMAL state
str [dIndex] = str [sIndex];
dIndex ++;
}
break;
case ESCAPE:
// in the ESCAPE state, copy the sequence
// like '\\\n', '\\\r', etc. as '\n', '\r', etc.
if (
str [sIndex] == '\0' ||
str [sIndex] == '\n' ||
str [sIndex] == '\r' ||
str [sIndex] == '\\' ||
str [sIndex] == '\'' ||
str [sIndex] == '\"' ||
str [sIndex] == (char) 0x1A /* Ctrl-Z */
) {
str [dIndex] = str [sIndex];
dIndex ++;
}
else {
// we have (mistakenly) entered the ESCAPE
// state, so output the suppressed '\\' character
// and the current character
str [dIndex] = '\\';
dIndex ++;
str [dIndex] = str [sIndex];
dIndex ++;
}
state = NORMAL;
break;
default:
// error !
break;
} // end switch
sIndex ++;
} // end while
// in case the last character is a '\\' indicated by the ESCAPE state at loop termination
// we have to copy the '\\' character
if (state == ESCAPE) {
str [dIndex] = '\\';
dIndex ++;
}
return dIndex;
}
// example usage
void ExampleUsage ()
{
char str [] = "\\\'Hello World\\\'";
unescape (str, strlen (str) + 1 /* last NULL byte */);
// OR: unescape (str, sizeof (str));
// after function call, str = 'Hello World'
}
Add your own comment.