简体   繁体   English

在 MySQL 表中查找所有只有 null 个值的列

[英]Find all those columns which have only null values, in a MySQL table

The situation is as follows:情况如下:

I have a substantial number of tables, with each a substantial number of columns.我有大量的表,每个表都有大量的列。 I need to deal with this old and to-be-deprecated database for a new system, and I'm looking for a way to eliminate all columns that have - apparently - never been in use.我需要为新系统处理这个旧的和即将弃用的数据库,我正在寻找一种方法来消除所有显然从未使用过的列。

I wanna do this by filtering out all columns that have a value on any given row, leaving me with a set of columns where the value is NULL in all rows.我想通过过滤掉在任何给定行上具有值的所有列来做到这一点,留下一组列,其中所有行的值为 NULL。 Of course I could manually sort every column descending, but that'd take too long as I'm dealing with loads of tables and columns.当然,我可以手动对每一列进行降序排序,但是这会花费很长时间,因为我要处理大量的表和列。 I estimate it to be 400 tables with up to 50 (.) columns per table.我估计它有 400 个表,每个表最多 50 (.) 列。

Is there any way I can get this information from the information_schema?有什么办法可以从 information_schema 中获取这些信息吗?

EDIT:编辑:

Here's an example:这是一个例子:

column_a    column_b    column_c    column_d
NULL        NULL        NULL        1
NULL        1           NULL        1
NULL        1           NULL        NULL
NULL        NULL        NULL        NULL

The output should be 'column_a' and 'column_c', for being the only columns without any filled in values. output 应该是“column_a”和“column_c”,因为这是唯一没有任何填充值的列。

You can avoid using a procedure by dynamically creating (from the INFORMATION_SCHEMA.COLUMNS table) a string that contains the SQL you wish to execute, then preparing a statement from that string and executing it. 您可以通过动态创建(来自INFORMATION_SCHEMA.COLUMNS表)包含要执行的SQL的字符串,然后从该字符串准备语句并执行它来避免使用过程。

The SQL we wish to build will look like: 我们希望构建的SQL看起来像:

SELECT * FROM (
  SELECT 'tableA' AS `table`,
         IF(COUNT(`column_a`), NULL, 'column_a') AS `column`
  FROM   tableA
UNION ALL
  SELECT 'tableB' AS `table`,
         IF(COUNT(`column_b`), NULL, 'column_b') AS `column`
  FROM   tableB
UNION ALL
  -- etc.
) t WHERE `column` IS NOT NULL

This can be done using the following: 这可以使用以下方法完成:

SET group_concat_max_len = 4294967295; -- to overcome default 1KB limitation

SELECT CONCAT(
         'SELECT * FROM ('
       ,  GROUP_CONCAT(
            'SELECT ', QUOTE(TABLE_NAME), ' AS `table`,'
          , 'IF('
          ,   'COUNT(`', REPLACE(COLUMN_NAME, '`', '``'), '`),'
          ,   'NULL,'
          ,    QUOTE(COLUMN_NAME)
          , ') AS `column` '
          , 'FROM `', REPLACE(TABLE_NAME, '`', '``'), '`'
          SEPARATOR ' UNION ALL '
         )
       , ') t WHERE `column` IS NOT NULL'
       )
INTO   @sql
FROM   INFORMATION_SCHEMA.COLUMNS
WHERE  TABLE_SCHEMA = DATABASE();

PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

See it on sqlfiddle . sqlfiddle上看到它。

I am not an expert in SQL procedures, hence giving general idea using SQL queries and a PHP/python script. 我不是SQL程序方面的专家,因此使用SQL查询和PHP / python脚本给出了一般的想法。

  • use SHOW TABLES or some other query on INFORMATION_SCHEMA database to get all tables in your database MY_DATABASE 使用SHOW TABLESINFORMATION_SCHEMA数据库上的一些其他查询来获取数据库MY_DATABASE所有表

  • do a query to generate a statement to get all column names in a particular table, this will be used in next query. 执行查询以生成语句以获取特定表中的所有列名称,这将在下一个查询中使用。

  SELECT Group_concat(Concat( "MAX(", column_name, ")" )) FROM information_schema.columns WHERE table_schema = 'MY_DATABSE' AND table_name = 'MY_TABLE' ORDER BY table_name,ordinal_position 
  • You will get an output like MAX(column_a),MAX(column_b),MAX(column_c),MAX(column_d) 您将获得MAX(column_a),MAX(column_b),MAX(column_c),MAX(column_d)

  • Use this output to generate final query : 使用此输出生成最终查询:

SELECT Max(column_a), Max(column_b), Max(column_c), Max(column_d) FROM MY_DATABASE.MY_TABLE SELECT Max(column_a),Max(column_b),Max(column_c),Max(column_d)FROM MY_DATABASE.MY_TABLE

The output would be : 输出将是:

   MAX(column_a)    MAX(column_b)   MAX(column_c)   MAX(column_d)
     NULL            1           NULL                1
  • All the columns with Max value as NULL are the ones which have all values NULL Max值为NULL所有列都是NULL值为NULL

You can take advantage of the behavior of COUNT aggregate function regarding NULLs. 您可以利用有关NULL的COUNT聚合函数的行为。 By passing the field as argument, the COUNT function returns the number of non-NULL values while COUNT(*) returns the total number of rows. 通过将字段作为参数传递, COUNT函数返回非NULL值的数量,而COUNT(*)返回总行数。 Thus you can calculate the ratio of NULL to "acceptable" values. 因此,您可以计算NULL与“可接受”值的比率。

I will give an example with the following table structure: 我将举例说明以下表结构:

CREATE TABLE `t1` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
   `col_1` int(10) unsigned DEFAULT NULL,
   `col_2` int(10) unsigned DEFAULT NULL,
   PRIMARY KEY (`id`)
) ;

-- let's fill the table with random values
INSERT INTO t1(col_1,col_2) VALUES(1,2);
INSERT INTO t1(col_1,col_2) 
SELECT 
IF(RAND() > 0.5, NULL ,FLOOR(RAND()*1000), 
IF(RAND() > 0.5, NULL ,FLOOR(RAND()*1000) FROM t1;

-- run the last INSERT-SELECT statement a few times
SELECT COUNT(col_1)/COUNT(*) AS col_1_ratio, 
COUNT(col_2)/COUNT(*) AS col_2_ratio FROM t1;

You can write a function that automatically constructs a query from the INFORMATION_SCHEMA database by passing the table name as input variable. 您可以编写一个函数,通过将表名作为输入变量传递,从INFORMATION_SCHEMA数据库中自动构造查询。 Here's how to obtain the structure data directly from INFORMATION_SCHEMA tables: 以下是如何直接从INFORMATION_SCHEMA表获取结构数据:

SET @query:=CONCAT("SELECT @column_list:=GROUP_CONCAT(col) FROM (
SELECT CONCAT('COUNT(',c.COLUMN_NAME,')/COUNT(*)') AS col
FROM INFORMATION_SCHEMA.COLUMNS c 
WHERE NOT COLUMN_KEY IN('PRI') AND TABLE_SCHEMA=DATABASE() 
AND TABLE_NAME='t1' ORDER BY ORDINAL_POSITION ) q");
PREPARE COLUMN_SELECT FROM @query;
EXECUTE COLUMN_SELECT;
SET @null_counters_sql := CONCAT('SELECT ',@column_list, ' FROM t1');
PREPARE NULL_COUNTERS FROM @null_counters_sql;
EXECUTE NULL_COUNTERS;

SQL Fiddle Demo Link SQL小提琴演示链接

I have created 4 tables. 我创建了4个表。 Three for demo and one nullcolumns is the compulsory part of solution. 三个用于演示和一个nullcolumns是解决方案的必要部分。 Among three tables, only salary and dept have columns with all values null (you may have a look at their script). 在三个表中,只有salarydept具有所有值为null的列(您可以查看它们的脚本)。

The compulsory table and the procedure are given at the end 最后给出了强制表和程序

You can copy paste and run (the compulsory part or all) as sql (just you have to change the delimiter to //) in your desired database on your localhost and then --- call get(); 您可以将粘贴和运行(必需部分或全部)复制为sql(只需将分隔符更改为//)在localhost上的所需数据库中,然后--- call get(); and see the results 并看到结果

CREATE TABLE IF NOT EXISTS `dept` (
  `did` int(11) NOT NULL,
  `dname` varchar(50) DEFAULT NULL,
  PRIMARY KEY (`did`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;


INSERT INTO `dept` (`did`, `dname`) VALUES
(1, NULL),
(2, NULL),
(3, NULL),
(4, NULL),
(5, NULL);

CREATE TABLE IF NOT EXISTS `emp` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `ename` varchar(50) NOT NULL,
  `did` int(11) NOT NULL,
  PRIMARY KEY (`ename`),
  KEY `deptid` (`did`),
  KEY `id` (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1 AUTO_INCREMENT=6 ;


INSERT INTO `emp` (`id`, `ename`, `did`) VALUES
(1, 'e1', 4),
(2, 'e2', 4),
(3, 'e3', 2),
(4, 'e4', 4),
(5, 'e5', 3);


CREATE TABLE IF NOT EXISTS `salary` (
  `EmpCode` varchar(50) NOT NULL,
  `Amount` int(11) DEFAULT NULL,
  `Date` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

INSERT INTO `salary` (`EmpCode`, `Amount`, `Date`) VALUES
('1', 344, NULL),
('2', NULL, NULL);

------------------------------------------------------------------------
------------------------------------------------------------------------

CREATE TABLE IF NOT EXISTS `nullcolumns` (
  `Table_Name` varchar(100) NOT NULL,
  `Column_Name` varchar(100) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

--Only one procedure Now
CREATE PROCEDURE get(dn varchar(100))
BEGIN
declare c1 int; declare b1 int default 0; declare tn varchar(30);
declare c2 int; declare b2 int; declare cn varchar(30);

select count(*) into c1 from information_schema.tables where table_schema=dn;
delete from nullcolumns;
while b1<c1 do
select table_name into tn from information_schema.tables where
table_schema=dn limit b1,1;        

select count(*) into c2 from information_schema.columns where
table_schema=dn and table_name=tn;
set b2=0;
while b2<c2 do
select column_name into cn from information_schema.columns where
table_schema=dn and table_name=tn limit b2,1;

set @nor := 0;
set @query := concat("select count(*) into @nor from ", dn,".",tn);
prepare s1 from @query;
execute s1;deallocate prepare s1;

if @nor>0 then set @res := 0;
set @query := concat("select ((select max(",cn,") from ", dn,".",tn,")
is NULL) into @res");
prepare s1 from @query;
execute s1;deallocate prepare s1;

if @res=1 then
insert into nullcolumns values(tn,cn);
end if; end if;

set b2=b2+1;
end while;

set b1=b1+1;
end while;
select * from nullcolumns;
END;

You can easily execute stored procedure easily as sql in your phpmyadin 'as it is' just change the Delimiters (at the bottom of SQL quesry box) to // Then 您可以轻松地在phpmyadin中以sql的形式轻松执行存储过程,因为它只是将Delimiters(在SQL问题框的底部)更改为// Then

call get();

And Enjoy :) 享受 :)

You can see Now the table nullcolumns showing all columns having 100/100 null values along with the table Names 您可以看到现在表nullcolumns显示所有具有100/100空值的列以及表Names

In procedure code if @nor>0 restricts that no empty table should be included in results you can remove that restriction. 在过程代码中, if @nor>0限制结果中不应包含空表,则可以删除该限制。

I think you can do this with GROUP_CONCAT and GROUP BY: 我认为您可以使用GROUP_CONCAT和GROUP BY执行此操作:

select length(replace(GROUP_CONCAT(my_col), ',', ''))
from my_table
group by my_col

( untested ) 未经测试

EDIT : the docs don't seem to state that GROUP_CONCAT needs a corresponding GROUP BY, so try this: 编辑 :文档似乎没有声明GROUP_CONCAT需要相应的GROUP BY,所以试试这个:

select 
    length(replace(GROUP_CONCAT(col_a), ',', '')) as len_a
    , length(replace(GROUP_CONCAT(col_b), ',', '')) as len_b
    , length(replace(GROUP_CONCAT(col_c), ',', '')) as Len_c
from my_table

You can do it with a prepared statement fed by MySQL's information scheme:您可以使用由 MySQL 的信息方案提供的准备好的语句来完成:

SET @TABLE_NAME= '...';

SET SESSION group_concat_max_len = 1000000;

SELECT 
CONCAT('SELECT * FROM (', 
    Group_concat(CONCAT("SELECT '", COLUMN_NAME, "' AS n, MAX(", COLUMN_NAME, ") AS v FROM ", @TABLE_NAME ) SEPARATOR ' UNION ALL ')
    , ") x WHERE v IS NOT null"
)
INTO @q
FROM   information_schema.columns
WHERE  table_schema = (SELECT DATABASE())
AND table_name = @TABLE_NAME
ORDER  BY table_name,ordinal_position
;

         
PREPARE ps FROM @q;
EXECUTE ps;
select column_name
from user_tab_columns
where table_name='Table_name' and num_nulls>=1;

Just by simple query you will get those two columns. 只需通过简单的查询,您就可以获得这两列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 MySQL 中找到表中哪些列具有空值? - How can I find which columns in a table have null values in MySQL? 仅当这些值不在表中时才插入到 mysql 表中 - Insert into mysql table only if those values are not in table 如何从MySQL表中获取多列具有相同值的所有行? - How to get all the rows from MySQL table which have same values for multiple columns? 我按某种条件选择所有列,我只希望那些列会选择其中包含一些数据的列 - I select all columns by some condition I want only those column will select which have some data in it 如果所有列都有不同的值,则在Mysql表中插入值 - Inserting values in Mysql table if all the columns have different values 在MySQL中,如何获取表的子集,仅具有数字类型的列? - In MySQL how to get the subset of a table ,having only those columns which are of numeric type? 有没有办法为mysql中的所有列找到唯一的值计数和空百分比? - Is there a way to find unique count of values and null percent for all columns in mysql? mySQL:如何过滤其他表中没有相等值的列? - mySQL: How to filter columns which have no equal values in other table? 导出MySQL表,其中行值仅包含NULL值 - Export MySQL table in which row values contains only NULL value mysql的列列表仅检查其中之一是否有价值 - mysql list of columns check only one of those have value
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM