When it comes to streaming, writing your own data generator is often times the easiest way forward. I’ve written a simple Python script providing me with realistically looking web proxy logs. Apart from other features (volumes, frequency, DoS attack simulation etc.), it allows to stream traffic to a remote HTTP endpoint.

This post shows how to plug the generated output into NiFi, which creates a nice setup for clickstream analytics, web server monitoring and other useful streamed data processing scenarios.

The proxy log generator is a simple Python script. It takes a range of arguments to test different scenarios. Nevertheless, the use case I want to focus on today is streaming of inbound traffic to NiFi.

Source code and full documentation is available on GitHub. To generate logs and stream them to the assumed endpoint via HTTP I start the proxy as follows:

python src/log_generator.py \
--stream 100 \
--url http://localhost:8081/contentListener

Immediately, the generated logs show up in a console, wrapped as JSON objects. Mind you, all data is made up, including links, user names and response codes.

{"auth": "kourtney.buckley", "url": "http://www.alysonhunt.com/51/865/711.png", "timestamp": "2016-12-04 20:43:21", "user_agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0", "ip": "112.170.1.143", "res_status": "200", "res_size": 1850015}
{"auth": "-", "url": "http://www.sergio-ewing.com/283/123.xml", "timestamp": "2016-12-04 20:43:21", "user_agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36", "ip": "148.143.230.40", "res_status": "302", "res_size": 2973508}

However, after three unsuccessful attempts to connect to the remote service the proxy gives up.

The remote service is inaccessible, terminating ..

Time to start NiFi. Once it’s up and running, I go ahead and drop a ListenHTTP processor onto the canvas.

ListenHTTP as a Web Proxy Endpoint

ListenHTTP as a Web Proxy Endpoint

Listener configuration: URL, port and a base path

Listener configuration: URL, port and a base path

As soon as the processor is running, I restart the proxy and let it stream logs to the endpoint. Once again, here is how to do it.

python src/log_generator.py \
--stream 100 \
--url http://localhost:8081/contentListener

This time, all runs just fine and the proxy logs are being captured in NiFi.

nifi_listen_http_captured_logs

That’s it for now, I hope you enjoyed this brief demo on my approach of how to set up streaming in NiFi. In the next post, I will focus on data transformation and data flow performance tuning. Thanks for reading and don’t forget to check out the source code.

Categories: Python

Tomas Zezula

Hello! I'm a technology enthusiast with a knack for solving problems and a passion for making complex concepts accessible. My journey spans across software development, project management, and technical writing. I specialise in transforming rough sketches of ideas to fully launched products, all the while breaking down complex processes into understandable language. I believe a well-designed software development process is key to driving business growth. My focus as a leader and technical writer aims to bridge the tech-business divide, ensuring that intricate concepts are available and understandable to all. As a consultant, I'm eager to bring my versatile skills and extensive experience to help businesses navigate their software integration needs. Whether you're seeking bespoke software solutions, well-coordinated product launches, or easily digestible tech content, I'm here to make it happen. Ready to turn your vision into reality? Let's connect and explore the possibilities together.