Runner API

Local runner API

To check the runner's local API you first need to check if you have visual automation enabled and on what port.

Once you are sure, you need to go to this address in your browser http://localhost:7777/api/v1/openapi.json.

If you have visual automation enabled on a port other than the primary 7777 you must change the port in the link

Debug

Enables or disables debug overlay. Example visualization for visual automation looking for chrome icon.

# Request

PUT http://localhost:7777/api/v1/debug

Parameters:
  "enable": true, #enable/disable overlay
  "time": 5 #number of seconds for which the overlay is to be visible

# Response
200 OK

Click

It is used to click around the screen and comes in several flavors. For the sake of order, let's have the coordinate system as we count in case there is any doubt - from the lower left corner of a given screen.

By coordinates

The application clicks in the given coordinates on the screen.

# Request

POST /api/v1/click/coordinates

Parameters:
  "screen": 0, # defaults to 0, but counts from 0+
  "button": 0, # default 0-left, push button code either "left", "right" or "middle"
  "double": false, # defaults to false, whether it is a two-click (true when yes)
  "x": 123,
  "y": 1233

# Response

201 Created - When the click went through without a problem
422 Unprocessable Entity - When the click failed or the data is incorrect

By image

The application finds something that looks like the substituted image. Click on the "center" of the found image.

# Request

POST /api/v1/click/image

Parameters:
  "screen": 0, # defaults to 0, but counts from 0+
  "button": 0, # push button code either "left", "right" or "middle"
  "double": false # defaults to false, whether it is a two-click (true when yes)
  "index": 0, # as we find, for example, several occurrences of the same image, the first image is 0, the next is 1 and so on, we decide here which image we click on
  "offset_x": 123, # can be + and -, for example, -100 or 0 or 100, the default is 0
  "offset_y": 1233, # can be + and -, for example, -100 or 0 or 100, the default is 0
  "threshold": 998 # accuracy in the range 0..1000 calculated internally as float x/1000.0 i.e. 999 => 0.999, optional parameter

+ uploaded image somewhere here as multipart etc in request to know what to look for (png/jpg)

# Response

200 ok
404 not found
422 invalid request data
503 lack of synchronization

By text

The application finds text on the screen and then clicks it.

# Request

POST /api/v1/click/text

Parameters:
  "lang": "pl" # language of the text
  "text": "lorem ipsum" # text to find
  "screen": 0, # defaults to 0, but counts from 0+
  "button": 0, # push button code either "left", "right" or "middle"
  "double": false # defaults to false, whether it is a two-click (true when yes)
  "index": 0, # as we find, for example, several occurrences of the same text, the first text is 0, the next is 1 and so on, we decide here which text we click on
  "offset_x": 123, # can be + and -, for example, -100 or 0 or 100, the default is 0
  "offset_y": 1233, # can be + and -, for example, -100 or 0 or 100, the default is 0
  "roi_x": 0 # The area of interest in which the texts are to be searched. X coordinate. By default, the ROI is set for the entire screen (selected)
  "roi_y": 0 # The area of interest in which the texts are to be searched. Y coordinate.
  "roi_w": 1920 # The area of interest in which the texts are to be searched. Width NOTE: width and height plot the image to the right and down from the selected coordinates.
  "roi_h": 1080 # The area of interest in which the texts are to be searched. Height
  "black_text": true # Set to preprocess the screenshot before running OCR. 
                     # A value of false optimizes detection for light text on a dark background at the expense of  
                     # dark text on a light background. Optional value, useful if you have a problem
                     # with detecting light text on a dark background. The default value is true
  "do_not_preprocess" : false # Optional parameter. Allows you to disable image preprocessing
                              # for OCR purposes (except scaling to ~300dpi). It can help in special
                              # cases if none of the 'black_text' values gave good results

# Response

200 ok
404 not found
422 invalid request data

Hover

Moves the cursor to the specified coordinates.

# Request

POST /api/v1/hover

Parameters:
  "screen": 0, # defaults to 0, but counts from 0+
  "x": 123,
  "y": 1233

# Response
201 Created - When the click went through without a problem
422 Unprocessable Entity - When the click failed or the data is incorrect

Type

Simulates pressing keys on a keyboard.

# Request

POST http://localhost:7777/api/v1/type

keys: "[control][space]" # presses together
keys: "safari[enter]" # writes and presses enter
keys: "jasiu[control][c]" # writes, presses together
keys: "[control][v]" # presses together

The syntax of "keys consists, of regular characters and sticky characters. 
Sticky characters if they are directly next to each other are executed together.
When a group of sticky characters contains a modifier is executed 
as mod_down + keys_press + mod_up.
Supported characters 0-9 A-Z.
Supported keys: shift, space, escape, enter, control, win, alt, altgr.
Support directory expandable.
Default character input delay: ~100ms

# Response
201 Created - When the click went through without a problem
422 Unprocessable Entity - When the click failed or the data is incorrect

The keys parameter contains the sequence of keys pressed. Type simulates pressing real keys, which, unlike pasting text, allows you to use keyboard shortcuts native to your system such as Alt+F4 or CTRL+V.

Keys are pressed at 50ms intervals to faithfully replicate the way you type on a keyboard without the risk of generating an unpredictable situation in a text field. When pasting a value into a text field, many editors behave differently (a single event) than when there are dozens of key press events (a multi-event of changing the contents of a text field).

Modifier keys are so-called sticky keys (alt, win, control, altgr, shift). When executing a sequence, put the key codes in [].

Examples:

Pressing ctrl+c combination => keys"[control][c]"
Pressing alt+f4 combination => keys="[alt][f4]"
Press ctrl+l (go to browser bar), then type the address and press Enter => [control][l]www.cloudflare.com[enter]
Pressing ctrl+l and typing http://www.w3schools.com/HTML/tryit.asp?filename=tryhtml5_draganddrop in the browser bar and pressing Enter => [CONTROL][L]https[shift][;]//www.w3schools.com/[shift][h][shift][t][shift][M][shift][L]/tryit.asp[shift][?]filename=tryhtml5[shift][-]draganddrop[enter]
Maximize window with win key and up arrow => keys="[win][up]"

The key support catalog is expandable. The currently available keys can be seen in the screenshot below. In addition to the special keys, the characters 0-9 and A-Z are supported as standard.

Ocr

OCR has two options, the first is Find which finds the coordinates of the text you are looking for, and Get which retrieves the entire text from the screen.

Find

Find (text coordinates) on the screen.

# Request

POST http://localhost:7777/api/v1/ocr/find

Parameters:
  "screen": 0 # defaults to 0, but counts from 0+
  "language": "eng" # language of the text
  "index": 0 # offset of the text we are looking for, 0 is the first found
  "roi_x": 0 # The area of interest in which the texts are to be searched. X coordinate. By default, the ROI is set for the entire screen (selected)
  "roi_y": 0 # The area of interest in which the texts are to be searched. Y coordinate.
  "roi_w": 1920 # The area of interest in which the texts are to be searched. Width NOTE: width and height plot the image to the right and down from the selected coordinates.
  "roi_h": 1080 # The area of interest in which the texts are to be searched. Height
  "black_text": true # Set to preprocess the screenshot before running OCR. 
                     # A value of false optimizes detection for light text on a dark background at the expense of  
                     # dark text on a light background. Optional value, useful if you have a problem
                     # with detecting light text on a dark background. The default value is true
  "do_not_preprocess" : false # Optional parameter. Allows you to disable image preprocessing
                              # for OCR purposes (except scaling to ~300dpi). It can help in special
                              # cases if none of the 'black_text' values gave good results                          
# Response

200 ok
404 not found
422 invalid request data

Get

Download all the text from the screen.

# Request

POST http://localhost:7777/api/v1/ocr/get

Parameters:
  "screen": 0 # defaults to 0, but counts from 0+
  "language": "eng" # language of the text
  "index": 0 # offset of the text we are looking for, 0 is the first found
  "roi_x": 0 # The area of interest in which the texts are to be searched. X coordinate. By default, the ROI is set for the entire screen (selected)
  "roi_y": 0 # The area of interest in which the texts are to be searched. Y coordinate.
  "roi_w": 1920 # The area of interest in which the texts are to be searched. Width NOTE: width and height plot the image to the right and down from the selected coordinates.
  "roi_h": 1080 # The area of interest in which the texts are to be searched. Height
  "black_text": true # Set to preprocess the screenshot before running OCR. 
                     # A value of false optimizes detection for light text on a dark background at the expense of  
                     # dark text on a light background. Optional value, useful if you have a problem
                     # with detecting light text on a dark background. The default value is true
  "do_not_preprocess" : false # Optional parameter. Allows you to disable image preprocessing
                              # for OCR purposes (except scaling to ~300dpi). It can help in special
                              # cases if none of the 'black_text' values gave good results                          
# Response

200 ok
404 not found
422 invalid request data

{
  "plain_text": "Lorem ipsum, Lorem ipsumn", # string with text dumped from the entire screen
  "boxed_lines": [        # list of AltoTextLine objects
      {
          "x": 18,
          "y": 8,
          "w": 2299,
          "h": 30,
          "text_line": [
              {
                  "x": 18,
                  "y": 8,
                  "w": 256,   # width
                  "h": 30,    # height
                  "certainty": 0.9,
                  "word": "Lorem" 
              },
              {
                  "x": 18,
                  "y": 8,
                  "w": 256,
                  "h": 30,
                  "certainty": 0.3,
                  "word": "ipsum"
              }
          ]
      },
      {
          "x": 18,
          "y": 8,
          "w": 2299,
          "h": 30,
          "text_line": [
              {
                  "x": 18,
                  "y": 8,
                  "w": 256,
                  "h": 30,
                  "certainty": 0.9,
                  "word": "Lorem"
              },
              {
                  "x": 18,
                  "y": 8,
                  "w": 256,
                  "h": 30,
                  "certainty": 0.3,
                  "word": "ipsum"
              }
          ]
      }
  ]
}

Visual automation

Finding a picture.

# Request

POST http://localhost:7777/api/v1/visual/find

"screen": 0,
"index": 0, # not required, if empty return a list
"threshold": 998 # not required 

+ uploaded image somewhere here as multipart etc in request to know what to look for 
(png/jpg)

# Response

200 - ok
404 not found
422 invalid request data

The structure of the response, if an index is specified will contain only one entry.
The x and y coordinates define the upper left corner of the detected area keeping 
the coordinate system with (0,0) in the lower left corner.
[
    {
        "x": 765,
        "y": 649,
        "w": 99,
        "h": 99,
        "certainty": 1.000000
    },
    {
        "x": 765,
        "y": 1439,
        "w": 99,
        "h": 99,
        "certainty": 1.000000
    },
    {
        "x": 765,
        "y": 1643,
        "w": 99,
        "h": 99,
        "certainty": 1.000000
    }
]

Drag and drop

# Request

POST http://localhost:7777/api/v1/drag_and_drop

"screen": 0,
"button": 1,
"start_x": 123,
"start_y": 213,
"end_x": 123,
"end_y": 213,
"speed": 1 # time of action in seconds, defaults to 1

# Response

201 Created - done
422 invalid request data

Screenshot

Downloads a PNG with a screenshot of the selected monitor.

# Request

GET http://localhost:7777/api/v1/screenshot

"screen": 0 

# Response

200 OK
422 invalid request data

Developer guides

Finding by image

When searching for an image (visual find) or click image (click image) to search for, it is crucial to prepare the master image correctly.

The quality of the results and the ability to match are closely linked to the size of the reference image. Pay attention to the DPI of the monitor on which the image will be searched. If the monitor operates at a high DPI (high pixel density) the search using a low DPI image will fail. The same will not work in the opposite direction.

Searching with a higher DPI image is possible if the image is reduced accordingly based on the DPI of the monitor before searching. Information about the pixel density is available in the status.

For example, let's take a search or click on the ruby icon on the Just Join IT website.

This requires the preparation of a master image of the selected icon.

A well-prepared icon

Poorly prepared icon

If the reference icon contains margins that are too large, the icon search will start returning a match on almost every round icon located in the icon bar on the page. This happens because of the matching area. Percentage-wise, the white margin and "roundness" of the symbol becomes more important than the actual icon inside. Proper preparation of the icon even gives a 100% (1.0 certainty) match.

PreviousRunner NextAdmin

Last updated 2 years ago