Miller 3.4.0 发布,CSV 和 JSON 处理工具
Miller 3.4.0 发布了,
% mlr --csv cut -f hostname,uptime mydata.csv % mlr --csv --rs lf filter '$status != "down" && $upsec >= 10000' *.csv % mlr --nidx put '$sum = $7 < 0.0 ? 3.5 : $7 + 2.1*$8' *.dat % grep -v '^#' /etc/group | mlr --ifs : --nidx --opprint label group,pass,gid,member then sort -f group % mlr join -j account_id -f accounts.dat then group-by account_name balances.dat % mlr put '$attr = sub($attr, "([0-9]+)_([0-9]+)_.*", "\1:\2")' data/*.json % mlr stats1 -a min,mean,max,p10,p50,p90 -f flag,u,v data/* % mlr stats2 -a linreg-pca -f u,v -g shape data/
新版本改进内容:
Primary features:
-
JSON is now a supported format for input and output. Miller handles tabular data, and JSON supports arbitrarily deeply nested data structures, so if you want general JSON processing you should use
jq
. But if you have tabular data represented in JSON then Miller can now handle that for you. Please see
the reference page and the FAQ. -
Reshape is a standard data-processing idiom, now available in Miller: http://johnkerl.org/miller/doc/reference.html#reshape
-
Incidentally (not part of this release, but new since the last release) Miller is now available in FreeBSD's package manager: https://www.freshports.org/textproc/miller/. A full list of distributions containing Miller may be found here.
-
Miller is not yet available from within Fedora/CentOS, but as a step toward this goal, an SRPM is included in this release (see file-list below).
DSL enhancements for mlr put
and mlr filter
:
-
Regex captures
\0
through\9
: http://johnkerl.org/miller/doc/reference.html#Regex_captures -
Ternary operator in expression right-hand sides: e.g.
mlr put '$y = $x < 0.5 ? 0 : 1'
-
Boolean literals
true
andfalse
-
Final semicolon is now allowed: e.g.
mlr put '$x=1;$y=2;'
-
Environment variables are now accessible, where environment-variable names may be string literals or arbitrary expressions:
mlr put '$home = ENV["HOME"]'
ormlr put '$value = ENV[$name]'
. -
While records are still string-to-string maps for input and output, and between
then
statements, types are preserved between multiple statements within aput
. Example:mlr put '$y = string($x); $z = $y . $y'
works as expected, without requringmlr put '$y = string($x); $z = string($y) . string($y)'
as before.
Bug fixes:
-
Mixed-format join, e.g. CSV file joined with DKVP file, was incorrectly computing default separators (
IRS
,IFS
,IPS
). This resulted in records not being joined together. -
Segmentation violation on non-standard-input read of files with size an exact multiple of page size and not ending in
IRS
, e.g. newline. (This is less of a corner case than it sounds: for example, leave a long-running program running with output redirected to a file, then in a sleep-and-process loop, have Miller process that file. The former program's stdio library will likely be doing block-sized buffered I/O, where block sizes will often be multiples of system page size and the block will almost surely not ending a newline.)