统计对象大小信息的函数和子查询的Bug，统计对

和通数据库htsjk.Com2019-03-07 22:18 来源:未知阅读:6991 评论 83 热度5

标签：excel函数

统计对象大小信息的函数和子查询的Bug，统计对象函数bug

I hava below two statement sql:
0. not in subquery
select a.schemaname, pg_size_pretty(pg_total_relation_size(a.schemaname||'.'||a.tablename)) from pg_tables a where a.tablename not in (select b.tablename from t b);
1. in subquery
select a.schemaname, pg_size_pretty(pg_total_relation_size(a.schemaname||'.'||a.tablename)) from pg_tables a where a.tablename in (select b.tablename from t b);

The [0. not in subquery] can't work well, it's occur error:
ERROR: query plan with multiple segworker groups is not supported (cdbdisp.c:500)
HINT: likely caused by a function that reads or modifies data in a distributed table
CONTEXT: SQL statement "select sum(pg_total_relation_size('information_schema.sql_languages'))::int8 from gp_dist_random('gp_id');"

The [1. in subquery] work well.

Detailed below test:

gtlions=# select version();
version
------------------------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 8.2.15 (Greenplum Database 4.2.7.3 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on May 7 2014 14:31:08
(1 row)

gtlions=# select a.schemaname, pg_size_pretty(pg_total_relation_size(a.schemaname||'.'||a.tablename)) from pg_tables a where a.tablename not in (select b.tablename from t b);
ERROR: query plan with multiple segworker groups is not supported (cdbdisp.c:500)
HINT: likely caused by a function that reads or modifies data in a distributed table
CONTEXT: SQL statement "select sum(pg_total_relation_size('information_schema.sql_languages'))::int8 from gp_dist_random('gp_id');"
gtlions=# explain select a.schemaname, pg_size_pretty(pg_total_relation_size(a.schemaname||'.'||a.tablename)) from pg_tables a where a.tablename not in (select b.tablename from t b);
QUERY PLAN
-----------------------------------------------------------------------------------------------
Hash Left Anti Semi Join (cost=568.98..235912.69 rows=676396 width=128)
Hash Cond: c.relname = "NotIn_SUBQUERY".tablename::name
-> Hash Left Join (cost=395.97..223194.68 rows=676419 width=128)
Hash Cond: c.relnamespace = n.oid
-> Hash Left Join (cost=2.62..112777.67 rows=676419 width=68)
Hash Cond: c.reltablespace = t.oid
-> Seq Scan on pg_class c (cost=0.00..2751.39 rows=676419 width=72)
Filter: relkind = 'r'::"char" AND relname IS NOT NULL
-> Hash (cost=1.02..1.02 rows=2 width=4)
-> Seq Scan on pg_tablespace t (cost=0.00..1.02 rows=128 width=4)
-> Hash (cost=365.35..365.35 rows=35 width=68)
-> Seq Scan on pg_namespace n (cost=0.00..365.35 rows=2240 width=68)
-> Hash (cost=106.61..106.61 rows=83 width=274)
-> Gather Motion 64:1 (slice1; segments: 64) (cost=0.00..106.61 rows=83 width=274)
-> Subquery Scan "NotIn_SUBQUERY" (cost=0.00..52.66 rows=2 width=274)
-> Seq Scan on t b (cost=0.00..51.83 rows=2 width=24)
(16 rows)

gtlions=# select a.schemaname, pg_size_pretty(pg_total_relation_size(a.schemaname||'.'||a.tablename)) from pg_tables a where a.tablename in (select b.tablename from t b);
schemaname | size-1
-------------+---------
public | 32 kB
public | 32 kB
......
......
public | 96 kB
gtlions=# explain select a.schemaname, pg_size_pretty(pg_total_relation_size(a.schemaname||'.'||a.tablename)) from pg_tables a where a.tablename in (select b.tablename from t b);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Gather Motion 64:1 (slice7; segments: 64) (cost=445.41..10096.03 rows=1 width=128)
-> Hash Left Join (cost=445.41..10096.03 rows=1 width=128)
Hash Cond: c.reltablespace = t.oid
-> Redistribute Motion 64:64 (slice5; segments: 64) (cost=443.06..10092.81 rows=1 width=132)
Hash Key: c.reltablespace
-> Hash Left Join (cost=443.06..10092.22 rows=1 width=132)
Hash Cond: c.relnamespace = n.oid
-> Redistribute Motion 64:64 (slice3; segments: 64) (cost=54.53..9703.24 rows=1 width=72)
Hash Key: c.relnamespace
-> Hash EXISTS Join (cost=54.53..9702.65 rows=1 width=72)
Hash Cond: c.relname = b.tablename::name
-> Redistribute Motion 1:64 (slice1) (cost=0.00..9621.26 rows=10570 width=72)
Hash Key: c.relname
-> Seq Scan on pg_class c (cost=0.00..2751.39 rows=676419 width=72)
Filter: relkind = 'r'::"char"
-> Hash (cost=53.49..53.49 rows=2 width=24)
-> Redistribute Motion 64:64 (slice2; segments: 64) (cost=0.00..53.49 rows=2 width=24)
Hash Key: b.tablename::name
-> Seq Scan on t b (cost=0.00..51.83 rows=2 width=24)
-> Hash (cost=388.10..388.10 rows=1 width=68)
-> Redistribute Motion 1:64 (slice4) (cost=0.00..388.10 rows=35 width=68)
Hash Key: n.oid
-> Seq Scan on pg_namespace n (cost=0.00..365.35 rows=2240 width=68)
-> Hash (cost=2.32..2.32 rows=1 width=4)
-> Redistribute Motion 1:64 (slice6) (cost=0.00..2.32 rows=2 width=4)
Hash Key: t.oid
-> Seq Scan on pg_tablespace t (cost=0.00..1.02 rows=128 width=4)
(27 rows)

该问题应该是个Bug,等待TSE给出Fix或者没有Fix而只能等到下个版本升级了.
-EOF-

有，做统计要用到什函数，那函数有怎表示，急急，如果有知道的，告知一下谢了

统计学的函数、算法、公式有一本新华字典那么厚。
有篇文章叫《Excel函数应用之统计函数详解教程》，可以看看。

函数名称函数说明语法形式
AVEDEV 返回一组数据与其均值的绝对偏差的平均值，即离散度。 AVEDEV(number1,number2, ...)
AVERAGE 返回参数算术平均值。 AVERAGE(number1,number2, ...)
AVERAGEA 计算参数清单中数值的平均值（算数平均值）。不仅数字，而且文本和逻辑值（如TRUE 和 FALSE）也将计算在内。 AVERAGEA(value1,value2,...)
BETADIST 返回 Beta 分布累积函数的函数值。Beta 分布累积函数通常用于研究样本集合中某些事物的发生和变化情况。 BETADIST(x,alpha,beta,A,B)
BETAINV 返回 beta 分布累积函数的逆函数值。即，如果 probability = BETADIST(x,...)，则 BETAINV(probability,...) = x。beta 分布累积函数可用于项目设计，在给定期望的完成时间和变化参数后，模拟可能的完成时间。 BETAINV(probability,alpha,beta,A,B)
BINOMDIST 返回一元二项式分布的概率值。 BINOMDIST(number_s,trials,probability_s,cumulative)
CHIDIST 返回 γ2 分布的单尾概率。γ2 分布与 γ2 检验相关。使用 γ2 检验可以比较观察值和期望值。 CHIDIST(x,degrees_freedom)
CHIINV 返回 γ2 分布单尾概率的逆函数。 CHIINV(probability,degrees_freedom)
CHITEST 返回独立性检验值。函数 CHITEST 返回 γ2 分布的统计值及相应的自由度。 CHITEST(actual_range,expected_range)
CONFIDENCE 返回总体平均值的置信区间。置信区间是样本平均值任意一侧的区域。 CONFIDENCE(alpha,standard_dev,size)
CORREL 返回单元格区域 array1 和 array2 之间的相关系数。使用相关系数可以确定两种属性之间的关系。 CORREL(array1,array2)
COUNT 返回参数的个数。利用函数 COUNT 可以计算数组或单元格区域中数字项的个数。 COUNT(value1,value2, ...)
COUNTA 返回参数组中非空值的数目。利用函数COUNTA 可以计算数组或单元格区域中数据项的个数。 COUNTA(value1,value2, ...)
COVAR 返回协方差，即每对数据点的偏差乘积的平均数，利用协方差可以决定两个数据集之间的关系。 COVAR(array1,array2)
CRITBINOM 返回使累积二项式分布大于等于临界值的最小值。此函数可以用于质量检验。 CRITBINOM(trials,probability_s,alpha)
DEVSQ 返回数据点与各自样本均值偏差的平方和。 DEVSQ(number1,number2,...)
EXPONDIST 返回指数分布。使用函数 EXPONDIST 可以建立事件之间的时间间隔模型。 EXPONDIST(x,lambda,cumulative)
FDIST 返回 F......余下全文>>

Excel函数统计问题

参考以下的例子。。。。
15、SUMPRODUCT查询符合不同条件的个数应用：*号可理解为AND，“并且”的意思。
年级班级姓名性别民族出生年月
一年级一1班张三1 男汉 1998-07-01
一年级一1班张三2 女回 1996-07-02
一年级一1班张三3 女汉 1994-07-03
一年级一1班张三4 女汉 1992-07-04
一年级一1班张三5 男回 1999-07-05
一年级一1班张三6 男汉 1998-11-06
一年级一2班王五1 男汉 1998-11-07
一年级一3班王五2 女土 1998-11-08
一年级一4班王五3 女汉 1998-11-09
一年级一5班王五4 女汉 1998-11-10
一年级一6班王五5 男汉 1998-11-11
一年级一7班王五6 男汉 1998-11-12
1、求某年级某班的总人数：
=SUMPRODUCT((A:A="一年级")*(B:B="一1班"))
其中男生人数：
=SUMPRODUCT((A:A="一年级")*(B:B="一1班")*(D:D="男"))
2、少数民族学生人数：
=COUNTIF(E:E,"<>汉")
少数民族男生：
=SUMPRODUCT((E:E<>"汉")*(D:D="男"))

本站文章为和通数据库网友分享或者投稿，欢迎任何形式的转载，但请务必注明出处.
同时文章内容如有侵犯了您的权益，请联系QQ：970679559，我们会在尽快处理。

返回首页

分析函数在数据分析中的应用，

评论暂时关闭