Getting All the Email Addresses Out of Your Database, Even From Different Tables
A customer asked me again this week to get them a list of all the email addresses from their web site. This site has about a dozen different forms, all of which store email addresses, and all of which are stored in separate tables in the database. Some of these forms even collect multiple addresses, compounding the issue slightly. Also, this is a growing application that is getting new features weekly and I would like to not have to modify my query every time a new table is added. What to do?
The answer is a metadata query. Microsoft’s SQL Server and MySQL (since 5.0.2) both support accessing your database’s metadata through the INFORMATION_SCHEMA. This database schema is a collection of everything you could want to know about your database’s structure and it can be queried using standard Transact SQL statements. Assuming that all of the columns in our database have the string “email” in their name then we can get a list of them and act on them.
So let’s take a look at what we’ll be doing:
- Create a temporary table to store the results from each query as we go.
- Create three variables, one for the current table name, one for the current column name, and one for the query we’ll be executing to retrieve the data from each column.
- Create a cursor to get each column that has a name that matches our criteria (‘%email%’ in this case) and its associated table from the information schema.
- Fetch the first row into the cursor.
- Iterate the cursor.
- Generate a select query based on the current table and column names.
- Insert the results of that query into our temp table.
- Fetch the next row from the cursor.
- Close and deallocate the cursor.
- Select the information from our temp table.
- Drop the temp table.
Easy enough. Here’s what that query looks like for Microsoft SQL Server:
create table #retVal ( tablename nvarchar(max), columnname nvarchar(max), val nvarchar(max) ) declare @tablename nvarchar(max) declare @columnname nvarchar(max) declare @query nvarchar(max) declare cur cursor for select TABLE_NAME, COLUMN_NAME from information_schema.columns where column_name like '%email%' open cur fetch next from cur into @tablename, @columnname while @@fetch_status = 0 begin set @query = 'select ''' + @tablename set @query = @query + ''', ''' + @columnname set @query = @query + ''', [' + @columnname set @query = @query + '] from [' + @tablename + ']' --print(@query) insert into #retVal exec(@query) fetch next from cur into @tablename, @columnname end close cur deallocate cur select val as 'Email Address' from #retVal drop table #retVal |
Normally you want to be careful with executing concatenated SQL strings. Security is less of an issue here since we aren’t taking untrusted user input.
This query requires using a stored routine, function, or trigger in MySQL due to limitations on where cursors can be used.
The uses of the information schema don’t stop there however. We can also use it for things like creating forms in HTML, validating user input for type and length without having to break the DRY (Don’t Repeat Yourself) principal, and even storing validation regular expressions in the column comments (MySQL). Maybe I’ll get to that later. For now knock yourself out kicking out data from wherever it may be hiding inside your database.