图数据库nebula实时慢日志收集和展示

目的

因为我们的图数据库从neo4j社区版转到nebula graph方向。最近在项目压测的时候,一开始很平稳,运行一段时间后,NQL会越来越慢,发现性能巨差。nebula经常卡死,表现为:

系统资源

在做了一些参数调优后,发现状态有所缓解,但是一段时间后,还是会慢慢卡死,从现象上推测是有一种或几种慢NQL多次执行后,把系统资源消耗完了,导致大面积的堵塞。所以面临的需求还是要有个慢查询排查工具。

参考上一次做oracle慢日志收集展示的方法, https://www.dboop.com/oracle/oracle%E6%80%8E%E6%A0%B7%E5%AE%9E%E6%97%B6%E6%94%B6%E9%9B%86%E5%B1%95%E7%A4%BA%E6%85%A2%E6%9F%A5%E8%AF%A2/

变化的是:

1.建一张表,每隔1分钟(时间可调,但我们的场景1分钟足够了)

CREATE TABLE `monitor_nebula_slow` (
  `logid` int unsigned NOT NULL AUTO_INCREMENT,
  `SessionID` varchar(50) NOT NULL DEFAULT '',
  `ExecutionPlanID` varchar(50) NOT NULL DEFAULT '',
  `User` varchar(50) NOT NULL DEFAULT '',
  `Host` varchar(50) NOT NULL DEFAULT '',
  `StartTime` datetime DEFAULT NULL,
  `DurationInUSec` int unsigned NOT NULL DEFAULT '0',
  `Status` varchar(50) NOT NULL DEFAULT '',
  `Query` varchar(5000) NOT NULL DEFAULT '',
  `_timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `md5id` varchar(64) NOT NULL DEFAULT '',
  PRIMARY KEY (`logid`),
  KEY `ix_monitor_nebula_slow` (`_timestamp`)
) ENGINE=InnoDB AUTO_INCREMENT=13231 DEFAULT CHARSET=utf8mb3

2.收集到的信息,做个dashboard页,实时展示出来

源码如下:

<para>
  <name>btime</name>
  <title>开始时间</title>
  <type>datetime</type>
  <defaultvalue>getdate()-4h</defaultvalue>
   <format>%Y-%m-%d %H:%M:00</format>
  </para>

  
  <para>
  <name>etime</name>
  <title>结束时间</title>
  <type>datetime</type>
  <defaultvalue>getdate()-1m</defaultvalue>
   <format>%Y-%m-%d 23:59:59</format>
  </para>





      <para>
  <name>nql</name>
     <title>NQL</title>
  <type>string</type>
  <defaultvalue></defaultvalue>
  </para>
  
    

  
 <para>
   <title>nodeid</title>
  <name>nodeid</name>
    <type>select</type>
   <item_text>全部,10.0.0.77,10.0.0.78,10.0.0.79</item_text>
      <item_value>all,10.0.0.77:9669,10.0.0.78:9669,10.0.0.79:9669</item_value>
    <defaultvalue>all</defaultvalue> 
  </para>

<page>
  <viewtype>mutibar</viewtype>
     <chart_title>title : {text: 'Nebula实时慢日志(节点)',x:'center',y:20},</chart_title>
    <chart_legendstyle2>x: 'right',y: '200',borderWidth: 0,orient:'vertical'</chart_legendstyle2>
  <chart_legendstyle>x: 'right',y:0,borderWidth: 0,orient:'vertical'</chart_legendstyle>
   <chart_addin>grid:{width:880,y:10},</chart_addin>
  <connstr>link:dboop_nebula_slow</connstr>
  <sqlstr><![CDATA[
select 
date_format(aa.ftime, '%%m-%%d %%H:%%i') as ftime
,ifnull(bb.DBOwner,'') as sqlstr
,ifnull(bb.counts,0)  as counts 
from dba_report_datetime1  aa 
left join (
SELECT  date_format(a._timestamp, '%%Y-%%m-%%d %%H:%%i') as ftime,
replace(Host,':9669','')  as 'DBOwner',
count(*)  as counts
from  `monitor_nebula_slow`  a 
where a._timestamp between %s AND %s
and (a.Query like concat('%%',%s,'%%') or %s='')
and (a.Host=%s or %s='all')
group by ftime, DBOwner
)bb on aa.ftime=bb.ftime
where aa.ftime between %s AND %s and aa.ftime<curtime()
order by ftime,counts desc

]]>
  </sqlstr>
  <sqlpara>#1</sqlpara>
    <sqlpara>#2</sqlpara>
  <sqlpara>#3</sqlpara>  <sqlpara>#3</sqlpara>
  <sqlpara>#4</sqlpara>  <sqlpara>#4</sqlpara>
   <sqlpara>#1</sqlpara>
    <sqlpara>#2</sqlpara>
  <width>1080px</width>
  <height>160px</height>
   <tableformat>rowconvert 1 15</tableformat>
  <chart_group>a</chart_group>
  <chart_datazoom>60</chart_datazoom>
</page>


<page>
  <viewtype>mutibar</viewtype>
     <chart_title>title : {text: 'Nebula实时慢日志(NQL)',x:'center',y:20},</chart_title>
    <chart_legendstyle2>x: 'right',y: '200',borderWidth: 0,orient:'vertical'</chart_legendstyle2>
  <chart_legendstyle>x: 'right',y:0,borderWidth: 0,orient:'vertical'</chart_legendstyle>
   <chart_addin>grid:{width:880,y:10},</chart_addin>
  <connstr>link:dboop_nebula_slow</connstr>
  <sqlstr><![CDATA[
select 
date_format(aa.ftime, '%%m-%%d %%H:%%i') as ftime
,ifnull(bb.DBOwner,'') as sqlstr
,ifnull(bb.counts,0)  as counts 
from dba_report_datetime1  aa 
left join (
SELECT  date_format(a._timestamp, '%%Y-%%m-%%d %%H:%%i') as ftime,
left(md5id,15)  as 'DBOwner',
count(*)  as counts
from  `monitor_nebula_slow`  a 
where a._timestamp between %s AND %s
and (a.Query like concat('%%',%s,'%%') or %s='')
and (a.Host=%s or %s='all')
group by ftime, DBOwner
)bb on aa.ftime=bb.ftime
where aa.ftime between %s AND %s and aa.ftime<curtime()
order by ftime,counts desc

]]>
  </sqlstr>
  <sqlpara>#1</sqlpara>
    <sqlpara>#2</sqlpara>
  <sqlpara>#3</sqlpara>  <sqlpara>#3</sqlpara>
  <sqlpara>#4</sqlpara>  <sqlpara>#4</sqlpara>
   <sqlpara>#1</sqlpara>
    <sqlpara>#2</sqlpara>
  <tableformat>rowconvert 1 15</tableformat>
  <width>1080px</width>
  <height>360px</height>
  <chart_group>a</chart_group>
  <chart_datazoom>60</chart_datazoom>
</page>



<page>
  <viewtype>table</viewtype>
  <connstr>link:dboop_nebula_slow</connstr>
  <title2>时间段内最频繁慢NQLTop50</title2>
  <sqlstr><![CDATA[
SELECT 
concat('<a href="view.html?action=v&report_action=v&report_ptname=dvjtlec9pqgza1782rx4wb0hyk&sqlid=',a.md5id,'" target="_blank">',left(a.md5id,10),'</a>' )
 as md5,
concat(max(left(a.Query,120)),'...') as nqlstr,
case when count(*)>10 then concat('<span class="f_red f_bold">',count(*),'</span>') else concat(count(*)) end as  '累计次数',
case when sum(DurationInUSec)>1000000000 then concat('<span class="f_red ">',format(sum(DurationInUSec)/1000000,2),'秒</span>') else concat(format(sum(DurationInUSec)/1000000,2),'秒') end as '累计耗时',
case when avg(DurationInUSec)>100000000 then concat('<span class="f_red ">',format(avg(DurationInUSec)/1000000,2),'秒</span>') else concat(format(avg(DurationInUSec)/1000000,2),'秒') end as '平均耗时',
case when max(DurationInUSec)>100000000 then concat('<span class="f_red ">',format(max(DurationInUSec)/1000000,2),'秒</span>') else concat(format(max(DurationInUSec)/1000000,2),'秒') end as '最慢耗时'
from  `monitor_nebula_slow`  a 
where a._timestamp between %s AND %s
and (a.Query like concat('%%',%s,'%%') or %s='')
and (a.Host=%s or %s='all')
group  by a.md5id 
order by sum(DurationInUSec) desc
limit 50

]]>
  </sqlstr>
  <sqlpara>#1</sqlpara>
    <sqlpara>#2</sqlpara>
  <sqlpara>#3</sqlpara>  <sqlpara>#3</sqlpara>
  <sqlpara>#4</sqlpara>  <sqlpara>#4</sqlpara>
    <width>1140px</width>
</page>


<debug>true</debug>


页面展示效果如下:

nebula慢查询实时展示页

3.对于每个慢查询,做个子页面,看到详细的NQL和每次执行用时等详细信息。

子页面源代码:

<page>
  <viewtype>table</viewtype>
   <tableheader><![CDATA[
<tr style="height:25px;">

<th width="100"><h3>发生时间</h3></th>
<th width="50"><h3>用时</h3></th>
<th><h3>NQL</h3></th>
</tr>



 ]]>
  </tableheader>  
  
  
  <connstr>link:dboop_nebula_slow</connstr>
  <sqlstr><![CDATA[
select date_format(StartTime, '%%m-%%d %%H:%%i:%%s') as ftime
,case when max(DurationInUSec)/1000000>5 then 

concat('<span class="f_red">',format(max(DurationInUSec)/1000000,2),'</span>秒')
 else
concat('<span>',format(max(DurationInUSec)/1000000,2),'</span>秒')
end
as costtime
,
concat('',a.`Query`
,' <br>'
,' <b>SessionID: </b><span class="f_666">',a.SessionID,'</span>'
,' <b>状态:</b><span class="f_red">'
,case 
when Status='RUNNING'   then '<span class="f_666">RUNNING</span>' 
else Status  end,'</span>'
,' <b>User:</b><span class="f_666">',a.User,'@',a.Host,'</span>'
,''
) as 'NSQL' from 
 monitor_nebula_slow a
where _timestamp  between %s and %s
 and md5id=%s 
group by StartTime,NSQL
order by StartTime desc 
    ]]>
  </sqlstr>

  <sqlpara>#1</sqlpara>
    <sqlpara>#2</sqlpara>
    <sqlpara>#3</sqlpara>
   
  <width>100%</width>

</page>


<para>
  <name>开始时间</name>
  <type>datetime</type>
  <defaultvalue>getdate()-4h</defaultvalue>
   <format>%Y-%m-%d %H:%M:00</format>
  </para>

  
  <para>
  <name>结束时间</name>
  <type>datetime</type>
  <defaultvalue>getdate()-1m</defaultvalue>
   <format>%Y-%m-%d %H:%M:00</format>
  </para>


  <para>
  <name>sqlid</name>
  <type>string</type>
  <defaultvalue></defaultvalue>
  </para>


子页面效果如下: nebula慢查询实时展示页

4.总结

通过以上的配置,我们快速的制作了一个辅助定位nebula Graph的慢查询问题的工具,可以实时的定位问题点。再和研发同学一起解决问题。

这个工具基于Python3.9+Django+MySQL完成,开发加调试共用时:3小时。

>> Home

51ak

2022/06/17

Categories: nosql slowlog nebula dashboard Tags: 原创 精品

《数据库工作笔记》公众号
扫描上面的二维码,关注我的《数据库工作笔记》公众号