Блог пользователя jamjury

Автор jamjury, история, 7 месяцев назад, По-английски

In my last blog post I wrote about mapping stdin into memory. This time we're going to be mapping stdout.

Firstly, we need to get a handle to stdout. That's easy enough with Win32 API GetStdHandle:

HANDLE Stdout = GetStdHandle(STD_OUTPUT_HANDLE);

Alternatively we could directly use NT API NtCurrentTeb, which GetStdHandle uses under the hood, but it's a little more lengthy:

HANDLE Stdout = NtCurrentTeb()->ProcessEnvironmentBlock->ProcessParameters->StdOutputHandle;

Next we need to map our file into memory. Windows (or any other platform) doesn't support write-only pages, so we have to map into read-write pages. Let's try to follow what we did for stdin (CreateFileMapping):

HANDLE FileMapping = CreateFileMapping(
    Stdout,          // [in] File
    NULL,            // [in] FileMappingAttributes [optional]
    PAGE_READWRITE,  // [in] Protect
    0,    /* one  */ // [in] MaximumSizeHigh
    4096, /* page */ // [in] MaximumSizeLow
    NULL             // [in] Name [optional]
);

Unfortunately, we get Access is denied, because stdout is not readable. What can we do about it?.. We can reopen the file with read access!

First, let's get the name of the output file. We can do it with Win32 API GetFinalPathNameByHandleW. We can't get VOLUME_NAME_DOS or VOLUME_NAME_GUID, so let's use VOLUME_NAME_NT:

WCHAR FileName[MAX_PATH];
DWORD ret = GetFinalPathNameByHandleW(Stdout, FileName, MAX_PATH, VOLUME_NAME_NT);

Again we could instead directly call NT API that GetFinalPathNameByHandleW uses under the hood — NtQueryObject:

struct { UNICODE_STRING Name; WCHAR Buffer[MAX_PATH]; } FileNameBuffer;
NtQueryObject(Stdout, ObjectNameInformation, &FileNameBuffer, sizeof(FileNameBuffer), NULL);

As a result we get something like this: \Device\ImDisk1\54345f194c63c344b90cbfcd5944cfd3\run-a6c9a90c30e45e273a5ca250be6b2efb\run\output.fd0138e687.txt

This means that CodeForces uses ImDisk and creates with it ramdisk to store output files to speed up writing. Nice!

Now we need to open this file. Win32 API function for this is CreateFile, which, despite the name, also opens files:

HANDLE File = CreateFileW(
    FileName,                           // [in] lpFileName
    GENERIC_READ | GENERIC_WRITE,       // [in] dwDesiredAccess
    FILE_SHARE_READ | FILE_SHARE_WRITE, // [in] dwShareMode
    NULL,                               // [in] lpSecurityAttribute [optional]
    OPEN_EXISTING,                      // [in] dwCreationDisposition
    0,                                  // [in] dwFlagsAndAttributes
    NULL                                // [in] hTemplateFile [optional]
);

But if we do it just like that, we get: The system cannot find the path specified. What we have to do instead is replace "\Device" in the beginning of FileName with "\\?" (in C string you'd use escapes: "\\\\?"). Alternatively we can use the underlying NT API NtOpenFile, which accepts NT object names directly:

HANDLE File;
OBJECT_ATTRIBUTES ObjectAttributes;
IO_STATUS_BLOCK IoStatusBlock;

InitializeObjectAttributes(&ObjectAttributes, &FileNameBuffer.Name, 0, NULL, NULL);
NtOpenFile(
   &File,                              // [out] FileHandle
   FILE_READ_DATA | FILE_WRITE_DATA,   // [in]  DesiredAccess
   &ObjectAttributes,                  // [in]  ObjectAttributes
   &IoStatusBlock,                     // [out] IoStatusBlock
   FILE_SHARE_READ | FILE_SHARE_WRITE, // [in]  ShareAccess
   0                                   // [in]  OpenOptions
);

If we open file without FILE_SHARE_READ | FILE_SHARE_WRITE we get STATUS_SHARING_VIOLATION.

Now that we've opened the file for both reading and writing we can map it and write to it without any issues.

const char* msg = "Hello from memory-mapped stdout\n";
size_t len = strlen(msg);
HANDLE map = CreateFileMapping(File, NULL, PAGE_READWRITE, 0, len, NULL);
void *view = MapViewOfFile(map, FILE_MAP_WRITE, 0, 0, 0);
memcpy(view, msg, len);

Except we get a funny bug from CodeForces runner.

Stack trace

After some digging around, we find that CodeForces runner really doesn't like string "PAGE_READWRITE" present anywhere in the code. So we just substitute this macro by hand with value 4.

A-a-and we finally get...

Hello from memory-mapped stdout

That's it!

Next time we'll use some other NT APIs to not just map statically-sized output file, but also extend it while it's mapped — for this we'll need MEM_RESERVE, which is only present in Windows. We'll also pack it all into a neat class, like we did last time with input. See you there!

P.S.

Полный текст и комментарии »

  • Проголосовать: нравится
  • +4
  • Проголосовать: не нравится

Автор jamjury, история, 23 месяца назад, По-английски

Memory-mapped input file is the fastest possible input method I know of. I used it for last ICPC Challenge.

Unfortunately, it is not possible — without exploiting Windows kernel bugs — to also map output file into memory on CF, because the handle lacks GENERIC_READ, which is required for FILE_MAP_WRITE mapping. So the best we can do for output is to write the whole buffer at once (but it doesn't really speed things up).

#include <Windows.h> // GetStdHandle, GetFileSize, CreateFileMapping, MapViewOfFile, UnmapViewOfFile, WriteFile, CloseHandle
#include <charconv> // from_chars, to_chars
#include <cstring> // memcpy
#include <cstdlib> // malloc, free
#include <cstddef> // size_t

// #define FREE_IO_ON_EXIT
 
struct input {
    LPVOID view;
    const char *first, *last;
 
    input() noexcept {
        HANDLE input_handle = GetStdHandle(STD_INPUT_HANDLE);
        DWORD input_size = GetFileSize(
            input_handle, // [in]            HANDLE  hFile,
            NULL          // [out, optional] LPDWORD lpFileSizeHigh
        );
        HANDLE mapping_object = CreateFileMapping(
            input_handle,  // [in]           HANDLE                hFile
            NULL,          // [in, optional] LPSECURITY_ATTRIBUTES lpFileMappingAttributes
            PAGE_READONLY, // [in]           DWORD                 flProtect
            0, /* whole */ // [in]           DWORD                 dwMaximumSizeHigh
            0, /*  file */ // [in]           DWORD                 dwMaximumSizeLow
            NULL           // [in, optional] LPCSTR                lpName
        );
        view = MapViewOfFile(
            mapping_object,  // [in] HANDLE hFileMappingObject
            FILE_MAP_READ,   // [in] DWORD  dwDesiredAccess
            0,               // [in] DWORD  dwFileOffsetHigh
            0,               // [in] DWORD  dwFileOffsetLow
            0 /*whole file*/ // [in] SIZE_T dwNumberOfBytesToMap
        );

        first = (char*) view;
        last = first + input_size;
        #ifdef FREE_IO_ON_EXIT
            CloseHandle(input_handle);
            CloseHandle(mapping_object);
        #endif
    }
 
    int take_int() noexcept {
        int result;
        first = std::from_chars(first, last, result).ptr + 1;
        return result;
    }
 
    double take_double() noexcept {
        double result;
        first = std::from_chars(first, last, result).ptr + 1;
        return result;
    }
 
    #ifdef FREE_IO_ON_EXIT
        ~input() noexcept {
            UnmapViewOfFile(view);
        }
    #endif
} in;

struct output {
    constexpr static std::size_t buf_size = 32*1024*1024; // 32MB
    char *buf, *first, *last;
 
    output() noexcept
        : buf((char*) std::malloc(buf_size))
        , first(buf)
        , last(buf + buf_size)
    {}
 
    void put_int(int value) noexcept {
        first = std::to_chars(first, last, value).ptr;
    }
 
    template <std::size_t size>
    void put_str(const char (&str)[size]) noexcept {
        std::memcpy(first, str, size - 1);
        first += size - 1;
    }
 
    void put_char(char c) noexcept {
        *first++ = c;
    }
 
    ~output() noexcept {
        HANDLE output_handle = GetStdHandle(STD_OUTPUT_HANDLE);
        WriteFile(output_handle, buf, first - buf, NULL, NULL);
        #ifdef FREE_IO_ON_EXIT
            CloseHandle(output_handle);
            std::free(buf);
        #endif
    }
} out;

int main() {
    out.put_int(in.take_int());
}

The benchmark (webarchive) on my machine gcc 13.2.0 (MinGW-W64), compiled with -std=c++20 -O2

int, scanf         4.01   3.93   4.05
int, cin           9.31   0.64   9.21
int, mmap in       0.06   0.06   0.06

int, printf        2.25   2.27   2.22
int, cout          0.39   0.40   0.40
int, mmap out      0.39   0.38   0.39

*mmap in/out is this implementation

Полный текст и комментарии »

  • Проголосовать: нравится
  • +9
  • Проголосовать: не нравится

Автор jamjury, история, 23 месяца назад, По-английски

In the problem from currently running Huawei contest (Accuracy-Preserving Summation Algorithm) a part of the task is to choose between IEEE-754 binary64, binary32 and binary16 floating point formats to use for number summation.

Apparently, people in charge of the contest don't know that both Intel and AMD support fp16 since 2011-2012 (AMD Bulldozer and Intel Ivy Bridge), it's supported by GCC 12+ and Clang 15+ as _Float16 and since C++23 as std::float16_t.

So for the checker they wrote their own implementation, which differs from IEEE in two places:

  1. the exponent range is [-16; +16] instead of [-15; +15];

  2. during conversion from fp64 least significant bits are just thrown away instead of being rounded.

Considering it's importance in the problem, I decided to check what's the range of fp64 that doesn't overflow when converted to fp16 and the difference that aforementioned differences make. Therefore I wrote a little program which prints the ranges and decided to share results here with anyone interested. It shows ranges for both "Huawei FP16", IEEE and "Corrected Huawei" — without rounding, but with correct exponent range.

Here are the results:

Huawei FP16
  MAX
    fp64: 131071.99999999999 (0x1.fffffffffffffp+16)
    fp16: 131008 (0x1.ffcp+16)
  OVERFLOW
    fp64: 131072 (0x1p+17)
    fp16: inf
  RANGE
    fp64: (-131072; 131072)
    fp16: [-131008; 131008]

IEEE-754 FP16
  MAX
    fp64: 65519.999999999993 (0x1.ffdffffffffffp+15)
    fp16: 65504 (0x1.ffcp+15)
  OVERFLOW
    fp64: 65520 (0x1.ffep+15)
    fp16: inf
  RANGE
    fp64: (-65520; 65520)
    fp16: [-65504; 65504]

Huawei FP16 with correct range
  MAX
    fp64: 65535.999999999993 (0x1.fffffffffffffp+15)
    fp16: 65504 (0x1.ffcp+15)
  OVERFLOW
    fp64: 65536 (0x1p+16)
    fp16: inf
  RANGE
    fp64: (-65536; 65536)
    fp16: [-65504; 65504]

https://godbolt.org/z/3s8GoKe8j

Source

I think it's kinda sloppy that the problem statement is inconsistent with the checker, but at least they've shared the checker's code.

Полный текст и комментарии »

  • Проголосовать: нравится
  • +19
  • Проголосовать: не нравится