Logs and Troubleshooting


[PDF]Logs and Troubleshooting - Rackcdn.comhttps://b6b45000d3362c7b69f8-0a7250d8bde19e2e09b93fa07c907bb0.ssl.cf5.rackc...

0 downloads 170 Views 252KB Size

HP OO 10 Community Content

Logs and troubleshooting

This document brings logs related information which will assist you in troubleshooting your HP OO configuration. It answers the following questions: • • • • •

Where can I find the logs? What information is logged in which log? How to read log messages? How to customize my log files? What other logging and tracking means I can utilize when working with HP OO?

HP Operations Orchestration HP Operations Orchestration (HP OO) is a next generation IT Process Automation solution that is designed from the ground up to increase automation adoption whether in a traditional data center or hybrid cloud environment. OO Community This content is created for the OO community. To get started with HP OO 10 check out our community onboarding kit that contains four persona-based learning tracks for administrators, flow authors, end users, and integrators. Each track provides knowledge assets of the following types: • Presentations • Videos • Tutorials • Guides • Best Practices

How logging is done in HP OO HP OO uses an industry standard logging library named log4j. The logs produced by log4j , are configured in a file named log4j.properties. Each HP OO component, Central, RAS and Studio has its own logs, and log4j.properites file, with its specific settings. The HP OO Central Log levels are set in the /central/conf/log4j.properties file. The location is similar to HP OO RAS and HP OO Studio.

Log4j configuration log4j has three main components: • logger: Responsible for capturing logging information. Loggers are defined in the application and represent a module/component that share common functionality which we want to log. Logger objects are structured in a hierarchal namespace. So in case our logger is “com.hp.oo”, com is a child of hp which is a child of HP OO. (This architecture provides the capability for very granular logging down to the class level, however this of course can be practically done by HP OO R&D only). • appender: Defines the log messages destination. Besides writing to a file, there are many types of appenders available, capable to save data in a JDBC db, NoSQL, send an email and so on. (The list of appenders is available at https://logging.apache.org/log4j/2.x/manual/appenders.html). • layout: Defines the logged message format. Let’s see each component by HP OO Central log4j property file example.

HP OO’s Log layout If we take a look at the top section of the log4j.properties file, where the defaults are defined, we can see the definition of the layout: msg.layout=%d [%t] (%F:%L) %-5p - %m%n

Here is the legend which enables us to understand this line: d - Used to output the date of the logging event t - Used to output the name of the thread that generated the logging event. F - Used to output the file name where the logging request was issued. L - Used to output the line number from where the logging request was issued p - Used to output the priority of the logging event. m – Used to output the application supplied message associated with the logging event. n - Outputs the platform dependent line separator character or characters. So if we take a sample message: “2015-06-21 10:47:07,330 [localhost-startStop-1] (ExecutorConfigurationSupport.java:203) INFO - Shutting down ExecutorService 'taskExecutor'”, the mapping will be: Date

thread name

file:line number

Priority

2015-06-21 10:47:07,330

[localhost-startStop-1]

(ExecutorConfigurationSupport.java:203) INFO

You can find the full list of options at https://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/PatternLayout.html

HP Operations Orchestration, September 2015

Message Shutting down ExecutorService 'taskExecutor'

Loggers and appenders The logger and the appender are combined in the following manner: log4j.logger. =, For example: log4j.logger.com.hp.oo=${log.level}, server.appender

In this example, the events of the logger which relates to the com.hp.oo package will be written to the server log. Looking at the appenders section, we can see that it is defined as a file appender and it’s file name (.File) property is set accordingly: #Appenders log4j.appender.server.appender=org.apache.log4j.RollingFileAppender log4j.appender.server.appender.File=${log.file.path}/server.log

Additional parameters such as the max log file size can also be set here. The log definition for each package are inherited from the nearest object in the hierarchy, but can also be set individually.

Example The log.level variable value is set to WARN The root logger, which captures all messages which aren’t otherwise captured by the other appenders, defined by OO R&D. It’s logging level is set to WARN, and it uses the general.appender, which writes to the general log. Thus any log messages that are not otherwise defined to be saved in other logs, will end up in the general log.

com.hp.oo uses the same log level, but another appender. It writes to the server log.

We can specify a different log level for com.hp.oo.security (which inherits from com.hp.oo).

Since loggers are hierarchical, we can add granularity to the example above.

HP Operations Orchestration, September 2015

Let’s assume we want to debug only authentication attempts, and not the entire com.hp.oo.security . In this case we can specify a log level deeper in the namespace by adding the following lines: log4j.logger.com.hp.oo.security.authn.services.AuthenticationListener=DEBUG, server.appender log4j.additivity.com.hp.oo.security.authn.services.AuthenticationListener=false

In the same manner we can add loggers in case the class name is known, however, since it is very hard to maintain such a big list, this level of granularity requires knowledge of OO’s code.

The role of each log HP OO Central has 4 appenders defined by default which are configured to log the messages in file(s). As a result HP OO Central log messages are written to 4 separate log files: Wrapper – the wrapper log belongs to the service wrapper of the OO tomcat process. It logs the startup process, and errors related to the oo application itself. If the entire app cannot start, or crashes – look here. Execution – will contain messages related to content operations execution. If a step has an unexpected exception, it will be written here. Intire run should be here but because we have a bug with score it appears in the general. General – everything which does not belong to another log will be written here. So this is the first place to look at. Server – everything not directly related to execution of content will be written here. E.g. doing ui (rest) operations such as deploying content. Authentication and authorization errors will appear here too. * OO also logs audit data which provides a track of security events such as authentication and authorization. The Audit data is saved in OO’s database and can be retrieved using OO Central’s REST API. Other logs such as upgrade logs (which are created under the upgrade folder) and Content Upgrade Utility logs will not be covered in this guide.

Troubleshooting General •

After reading the above section, you have a good idea in which log to start looking for the relevant trail. (Of course other logs may be relevant too, since you may not know what the root cause is in this stage).



In case the problem happens when communicating to a target system, have a look at the logs of this system too. In such cases the target system’s logs will usually be much more specific and may point you to the root cause (For example, in case of an authentication or authorization problem, a system may return a general error which will be written in the execution log, while the logs of the target system may provide an indication that the operation has failed due to insufficient privileges).



After an initial look at the logs, you can try to get a better picture by correlating errors from the different logs by time stamp.



If no clues are found in this stage, try to turn on DEBUG for some loggers.

HP Operations Orchestration, September 2015

How to read intimidating long log stacks? The right way to read a stack, is looking for the root cause by reading the last “caused by” section line of the relevant part. If we take a look at the bottom of the stack we can see more details, indicating that the jdbc connection to our postgres db has failed due some networking issue (The postgres server was not listening or could not be reached).

Thread Dumps A threads Dump can sometimes indicate the reason for a slow or a stuck system, and you do not have out of memory errors in the logs. You can produce a threads dump by using the following URL against your Central server: https://:8443/oo/monitoring?part=threadsDump You can also use jstack which is available in the JDK, and is documented here: http://docs.oracle.com/javase/7/docs/technotes/tools/share/jstack.html The following dump points us to the root cause for a stuck flow problem: "1_WorkerExecutionThread-78_4983623822" daemon prio=6 tid=0x000000001c2b9800 nid=0x5fcc runnable [0x0000000042d4c000] java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:272) at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) - locked <0x000000070013eeb0> (a java.io.BufferedInputStream) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177) - locked <0x000000070badff50> (a java.io.InputStreamReader) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:154) at java.io.BufferedReader.readLine(BufferedReader.java:317) - locked <0x000000070badff50> (a java.io.InputStreamReader) at java.io.BufferedReader.readLine(BufferedReader.java:382) at com.sun.security.auth.callback.TextCallbackHandler.readLine(TextCallbackHandler.java:151) at com.sun.security.auth.callback.TextCallbackHandler.handle(TextCallbackHandler.java:119) at com.sun.security.auth.module.Krb5LoginModule.promptForName(Krb5LoginModule.java:817) … at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)

HP Operations Orchestration, September 2015

at com.jcraft.jsch.jgss.GSSContextKrb5.init(GSSContextKrb5.java:129) at com.jcraft.jsch.UserAuthGSSAPIWithMIC.start(UserAuthGSSAPIWithMIC.java:135) at com.jcraft.jsch.Session.connect(Session.java:463) at com.jcraft.jsch.Session.connect(Session.java:183) at com.iconclude.dharma.commons.security.ssh.DefaultSSHSessionCreator.createSession(DefaultSSHSessionCreator.java:57)

If we relate to the lines marked in red we can understand that the execution thread is locked while trying to authentication against an ssh server using Kerberos. We can see that the authentication is done by using the jsch java library. A quick search in google raises a known issue regarding Kerberos authentication over ssh using this library…

Memory Dumps In case of Out of Memory errors memory dump files will be automatically created (memory dump files have the *.hprof extention). The following example, helped the engineer to understand the reason for a Central crash due to out of memory. local-scheduler-persisted_Worker-2 at java.lang.OutOfMemoryError.()V (OutOfMemoryError.java:48) at java.util.Arrays.copyOf([BI)[B (Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(I)V (ByteArrayOutputStream.java:113) at java.io.ByteArrayOutputStream.ensureCapacity(I)V (ByteArrayOutputStream.java:93) ..... at com.sun.proxy.$Proxy165.joinFinishedSplits(I)I (Unknown Source) at com.hp.score.job.ScoreEngineJobsImpl.joinFinishedSplitsJob()V (ScoreEngineJobsImpl.java:108) at com.hp.oo.orchestrator.services.SplitJoinJob.execute(Lorg/quartz/JobExecutionContext;)V (SplitJoinJob.java:20) at org.quartz.core.JobRunShell.run()V (JobRunShell.java:223) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run()V (SimpleThreadPool.java:549)

In order to focus on the relevant part of the stack, we need to find the first entry in the thread stack, containing "com.hp", which means that this is the HP OO code. From the stack above we can see, that the out of memory exception is caused by com.hp.score.job.ScoreEngineJobsImpl.joinFinishedSplitsJob(). Therefore, the immediate conclusion is that this happens when trying to join the branches of a multi-instance or a parallel step. In this case, there was a problem at the content level, causing the context of the multi-instance branches to be very large, thus causing a memory problem when these branch contexts were merged into the main flow context. The problem was solved by clearing specific flow variables which were created in the branches by using the “Clear Flow Variable” operation.

How to help support help you? Provide a detailed description of the issue, the environment, and describe any recent configuration changes.

HP Operations Orchestration, September 2015

Send full logs, not selected lines which you think indicating what the problem is. As you learned from the exception example, sometimes the context of the error you are running into provides valuable information to the person who tries to help you. Memory Dumps In case of Out of Memory errors please attach the memory dump files (memory dump files have the *.hprof extention) to the ticket. Target Systems Logs In case of content related issues – provide the relevant logs from the target system.

We are on the community forum, so please feel free to ask any questions and comments related to this content via the forum: http://www.hp.com/go/OOPractitionerForum Check the HP OO community for additional resources such as guides, tutorials, videos, content, and more: https://hpln.hp.com/group/operations-orchestration

HP Operations Orchestration, September 2015