Work more efficiently with log files using Linux csvkit

2 years ago 322

All IT pros and incidental handlers person to woody astir regular with log files from assorted sources. Learn to enactment much rapidly and efficiently to get the champion retired of CSV files with csvkit connected Linux.

log-file-concept.jpg

Image: BEST-BACKGROUNDS/Shutterstock

CSV files are often imported into Excel oregon LibreOffice bundle earlier being utilized and analyzed. It is precise convenient and comfortable, arsenic agelong arsenic the files are not excessively big. But immoderate log files mightiness incorporate billions of lines, which makes it intolerable to import it into spreadsheets. Or you mightiness request to analyse files remotely connected headless servers without immoderate quality to usage a graphical idiosyncratic interface.

SEE: Hiring Kit: JavaScript Developer (TechRepublic Premium)

Luckily, an casual solution is disposable connected each Linux operating systems: the csvkit software.

How to instal csvkit

With the instrumentality being disposable successful the modular repositories, it is highly casual to install. In this article, we'll usage an Ubuntu-based operating system.

Let's contented the installation successful a command-line ammunition by executing:

sudo apt instal csvkit

That's it. The strategy present installs the instrumentality and each the indispensable dependencies.

How to enactment connected a CSV file

To exemplify our point, we'll enactment connected a CSV record from SimpleMaps.com containing a database of cities and accusation astir them: country, longitude, latitude, colonisation and more.

The archetypal enactment of the CSV record shows the antithetic file names, arsenic is often the lawsuit with CSV files. We tin spot it with the "head" command, which by default shows the archetypal 10 lines of a record (Figure A).

Figure A

figa.jpg

The header of the CSV file.

How to fig retired the columns of the file

Now let's commencement utilizing csvcut from the command-line, 1 of the tools embedded successful the csvkit. Launching the adjacent bid volition automatically amusement the named columns and the indices (Figure B):

csvcut -n

Figure B

figb.jpg

  Using csvcut to database the columns from the file.

We mightiness past usage either the indices oregon the file names to code it.

How to output selected columns

One of the astir communal operations erstwhile dealing with CSV files consists of selecting conscionable a fewer columns, oregon reorganizing columns.

To output conscionable a fewer columns, let's erstwhile again usage the csvcut bid with the -c option. Both bid lines work, to amusement however to usage some the indices oregon the file name. In our example, we'll erstwhile again usage the caput bid with a pipe, conscionable to amusement the archetypal lines of the results (Figure C).

csvcut -c 1,5,10 csvcut -c city,country,population

Figure C

figc.jpg

  An output with a fewer selected columns.

Should we privation enactment numbers added to the output, enactment -l comes to rescue and adds a caller file named line_number to our output (Figure D).

Figure D

figd.jpg

  Adding a enactment fig to the output results.

Output tin of people beryllium redirected to a caller file. To bash this, we redirect the output to a record by utilizing the > character. From our erstwhile example:

csvcut -l -c city,country,population worldcities.csv > newfile.csv

How to alteration the file order

Using csvcut we tin besides make an output that reorders the columns. All we request is to specify the columns, and the instrumentality volition show it accordingly (Figure E).

Figure E

fige.jpg

  The reordered output.

How to benignant the information with csvsort

It is imaginable to benignant information utilizing the csvsort command. Similar to csvcut, csvsort allows the usage of enactment -n to database columns, and -c to usage either the file scale oregon the file name.

By default, csvsort works successful ascending mode, but it is imaginable to usage the -r enactment to benignant successful descending mode.

Let's benignant our record by state name, successful descending bid (Figure F):

csvsort -r -c state worldwities.csv

Figure F

figf.jpg

  Results sorted by state sanction successful descending order.

It is imaginable to benignant aggregate columns: All you request is to usage them with the -c enactment (Figure G). The adjacent enactment volition benignant our information successful descending mode by state and by population:

csvsort -r -c country,population worldcities.csv

Figure G

figg.jpg

  Sorted results with aggregate columns.

How to harvester csvcut and csvsort

Csvsort is almighty but it ever outputs each the columns. By combining csvcut and csvsort, we tin execute immoderate benignant of outputting oregon sorting.

As an example, let's extract lone the metropolis name, state name, latitude, longitude, and benignant those columns by latitude (Figure H).

csvcut -c city,country,lat worldcities.csv | csvsort -c lat

Figure H

figh.jpg

  Combining csvcut and csvsort.

How to get a nicer output

Should you privation a nicer output, bid csvlook allows you to render the CSV output successful a Markdown-compatible, fixed-width format.

From our erstwhile example, we conscionable tube the csvlook bid to the extremity of our enactment (Figure I):

csvcut -c city,country,lat worldcities.csv | csvsort -c lat | csvlook

Figure I

figi.jpg

  Results of the csvlook command.

How to get statistic with csvstat

The csvstat bid allows you to get antithetic statistic connected the CSV file.

Run without arguments but the filename, it provides elaborate statistic for each column. It is besides imaginable to usage the -c enactment to output selected columns (Figure J).

csvstat -c state

Figure J

figj.jpg

  Statistics connected the "country" column.

It is imaginable to tune the output of the bid by utilizing antithetic options.

To extract the unsocial values of the state column, we whitethorn usage the –unique enactment (Figure K).

Figure K

figk.jpg

  The fig of unsocial countries utilizing csvstat.

For a database of each options of csvstat, delight benignant the pursuing command:

csvstat -h

Csvkit contains respective antithetic command-line tools that let IT specialists and radical who request to enactment connected ample CSV files to bash it easy successful the command-line. The quality to harvester these tools, particularly csvcut and csvsort, makes it precise almighty and should suit each needs of professionals.

Additionally, it is besides imaginable to usage csvkit for converting XLS and JSON files to CSV earlier analyzing oregon utilizing them with the command-line tools.

Developer Essentials Newsletter

From the hottest programming languages to the jobs with the highest salaries, get the developer quality and tips you request to know. Weekly

Sign up today

Also see

Read Entire Article