Java dataframe and visualization library

Overview

Tablesaw

Apache 2.0 Build Status Codacy Badge Maintainability Rating

Overview

Tablesaw is Java for data science. It includes a dataframe and a visualization library, as well as utilities for loading, transforming, filtering, and summarizing data. It's fast and careful with memory. If you work with data in Java, it may save you time and effort. Tablesaw also supports descriptive statistics and integrates well with the Smile machine learning library.

Tablesaw features

Data processing & transformation

  • Import data from RDBMS, Excel, CSV, JSON, HTML, or Fixed Width text files, whether they are local or remote (http, S3, etc.)
  • Export data to CSV, JSON, HTML or Fixed Width files.
  • Combine tables by appending or joining
  • Add and remove columns or rows
  • Sort, Group, Query
  • Map/Reduce operations
  • Handle missing values

Visualization

Tablesaw supports data visualization by providing a wrapper for the Plot.ly JavaScript plotting library. Here are a few examples of the new library in action.

Tornadoes Tornadoes Tornadoes
Tornadoes Tornadoes Tornadoes
Tornadoes Tornadoes Tornadoes
Tornadoes Tornadoes Tornadoes

Statistics

  • Descriptive stats: mean, min, max, median, sum, product, standard deviation, variance, percentiles, geometric mean, skewness, kurtosis, etc.

Getting started

Add tablesaw-core to your project. You can find the version number for the latest release in the release notes:

<dependency>
    <groupId>tech.tablesaw</groupId>
    <artifactId>tablesaw-core</artifactId>
    <version>VERSION_NUMBER_GOES_HERE</version>
</dependency>

You may also add supporting projects:

  • tablesaw-beakerx - for using Tablesaw inside BeakerX
  • tablesaw-excel - for using Excel workbooks
  • tablesaw-html - for using HTML
  • tablesaw-json - for using JSON
  • tablesaw-jsplot - for creating charts

Documentation and support

And always feel free to ask questions or make suggestions here on the issues tab.

Integrations

Issues
  • Column-wise DataFrame-like operations

    Column-wise DataFrame-like operations

    Hi, I am new to Tablesaw. I am exploring options for recreating some Pyhton DataFrame operations in Tablesaw. For example, I have a DataFrame object called data1 and I use existing columns of this data frame to create new ones (and update existing ones). Here is a couple of lines of Pyhton codes:

    data1['days'] = data1['buyDate'].apply(lambda x: (today - x).days) ... data1['CAGR'] = ((data1['curValue'] / data1['bookValue']) ** (1.0 / data1['nYears']) - 1.0) * 100

    Is there a way to implement something similar using Tablesaw?

    Thanks a lot in advance.

    enhancement core 
    opened by imfaisalmb 31
  • Implement transpose #696

    Implement transpose #696

    Thanks for contributing.

    Description

    Added an implementation of Transpose. It has the restriction that columns must be of the same type. This is an initial version for feedback

    Testing

    Added a unit test for the feature

    opened by jackie-h 31
  • Readonly data: Any equivalent in tablesaw to pandas' view vs copy?

    Readonly data: Any equivalent in tablesaw to pandas' view vs copy?

    I've finally got my use case using tablesaw to an initial build and tried a few runs, and it is rather slow compared to a python version I wrote using pandas previously. I migrated to Java to get better concurrency in the hope of making it go faster.

    Profiling it I see why -- it is spending 75% of its time in tech.tablesaw.table.Rows.copy(). This is because of some Table.where() calls that are designed to filter down the input data.

    Briefly, the background is my input data is price history for certain financial assets. Having established a price at a certain time (a simulated trade entry point) I then want to see if the market went up or down by a certain amount from that point. I am currently doing so using a filter like:

    useData.where(useData.numberColumn("High").isGreaterThanOrEqualTo(target).or(useData.numberColumn("Low").isLessThanOrEqualTo(stop)))

    (method names from memory so may have slight inaccuracies)

    That gives me a filtered table, and then selecting the first row from this gives me the first instance that fulfilled the criteria. I'm only ever interested in the first row matching the criteria but can't think of a way to get tablesaw to stop once that row is found, so instead get the entire table and take the first row (or rather first value of each column) as needed. This kind of approach is applied repeatedly to further filter price data depending on what is found at each stage. The result is a lot of calls to Table.where().

    As I say the code ends up spending a lot of time copying rows, presumably from the original data to the return value of the where() method. Is there a way to get tablesaw to take a "view" approach as pandas would in this situation, that is not actually copy the data but simply copy references to the data, which would be faster? This obviously comes along with issues if one later tries to modify the filtered result represented by the view, because it would be impossible to do so without also modifying the original; pandas handles that by making attempts to alter a view an error, forcing the user to use an operation that would force a copy when that is what they want to do. For my particular use case, since the data is only read (in this particular case I don't even summarise it) I don't mind not being able to modify.

    Does tablesaw have anything analogous to this view concept from pandas? Again, the goal is to avoid needlessly copying a lot of data.

    opened by mark27q1 29
  • Add support to join tables on multiple columns.

    Add support to join tables on multiple columns.

    Thanks for contributing.

    Description

    Enhanced API to accept multiple names of columns to join on. Supported for all joins: inner, leftOuter, rightOuter, and fullOuter. Added javadoc where missing for public methods.

    Testing

    Added dozens of junit tests to cover all changed API. Ran coverage tool in eclipse to ensure testing/exercising of all changes was being done. Reached 100% coverage for all but one method. Goal to reach 100% coverage exposed failure of LONG types to be parse from cvs input so added missing support for that.

    opened by gregorco 22
  • Circular dependencies in Columns/ColumnTypes can lead to unpredictable behavior

    Circular dependencies in Columns/ColumnTypes can lead to unpredictable behavior

    A null value is sometimes returned by a ColumnType constant. Whether or not the value is null depends on the order in which other code is executed.

    To recreate, I built an array of ColumnTypes and printed the array. Sometimes the values are all initialized correctly, sometimes not:

    Correct:

    [STRING, LOCAL_DATE_TIME, INTEGER]
    

    Incorrect:

    [STRING, null, INTEGER]
    

    The different results can be had by adding or removing a line of code just before printing. The correct result is printed when the line is removed.

    Here is the main class with the offending line included:

    class DummyClass {
    
        public static void main(String[] args) {
            long missing = DateTimeColumn.MISSING_VALUE;
            new DummyPrinter().printColumnTypes();
        }
    } 
    
    

    When I comment out line 1 in main, the code works as expected. Also, the code that implements printing must be in a separate class for the error to occur. Here's that class:

    class DummyPrinter {
    
        void printColumnTypes() {
            System.out.println(Arrays.toString(columnTypes));
        }
    
        private static final ColumnType[] columnTypes = {
            STRING,
            LOCAL_DATE_TIME,
            INTEGER,
        };
    }
    
    

    The array itself is a literal constant. The values inserted into the array are also constants. They are declared in the ColumnType interface in the line shown below:

    DateTimeColumnType LOCAL_DATE_TIME = DateTimeColumnType.INSTANCE;
    

    In that line, the values are provided by a constant (INSTANCE) declared in DateTimeColumnType.

    Here is the code from that class where INSTANCE is created:

    public static final DateTimeColumnType INSTANCE =
            new DateTimeColumnType(BYTE_SIZE, "LOCAL_DATE_TIME", "DateTime");
    
    

    The line of code that turns the error on and off causes the class DateTimeColumnType to load in a different order when it's present than it does when absent.

    opened by lwhite1 21
  • toString() should be more fault tolerant

    toString() should be more fault tolerant

    I'm debugging some code where columns in a table can be of different sizes. To help diagnose the problem I'm printing the table. However, Relation's toString() method, that uses a DataFramePrinter, throws an IndexOutOfBoundsException because the frame.size() is based on the first column's size, while others are shorter. A toString-method should be very careful to not throw exceptions, since it's typically used during debugging. I suggest making DataFramePrinter more fault tolerant when fetching column values, e.g. in line 192: data[i][j] = frame.getString(i, j);

    opened by hallvard 21
  • Implement a pie chart

    Implement a pie chart

    Using xChart or JavaFX Charts create an interface that enables easy rendering of the plot from a tablesaw table. See the implementation of Bar Plots (which use JavaFX) for a very similar example.

    https://github.com/jtablesaw/tablesaw/blob/master/plot/src/main/java/tech/tablesaw/api/plot/Bar.java

    enhancement help wanted 
    opened by lwhite1 19
  • Major problems trying out plots; everything breaks.

    Major problems trying out plots; everything breaks.

    Maybe I'm looking in the wrong place, but it doesn't seem like there is sufficient information for a newcomer to get plotting working—and this is the main reason I'm looking at this library.

    1. First of all, a suggestion: the page https://jtablesaw.github.io/tablesaw/userguide/introduction says nothing at all about the dependency needed for plotting. One has to know to look back at https://github.com/jtablesaw/tablesaw to figure this out. But this is a minor issue.

    2. Much bigger is that the first code at https://jtablesaw.github.io/tablesaw/userguide/Introduction_to_Plotting does not work. Not even close. It won't even compile. It doesn't even have balanced parentheses. Fixing the parentheses still won't get it to compile, and certainly won't show the plot that was promised.

    Figure fig = BubblePlot.create("Average retail price for champagnes by year and rating",
                    champagne,					// table name
                    "highest pro score",		// x variable column name
                    "year",						// y variable column name
                    "Mean Retail"				// bubble size
                   ));
    
    1. So I skip the "introduction" page and go straight to the "real" code at https://jtablesaw.github.io/tablesaw/userguide/BarsAndPies .

    First we load the Tornado dataset:

    Table tornadoes = Table.read().csv("Tornadoes.csv");
    

    What is "the Tornado dataset" and where can I find it? An earlier paragraph mentioned, "we’ll use a Tornado dataset from NOAA", so I went to the NOAA site and downloaded the most promising looking data file named 1950-2017_all_tornadoes.csv. I tried loading that in Tablesaw:

    …
    Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 10000 out of bounds for length 10000
    	at com.univocity.parsers.common.ParserOutput.valueParsed(ParserOutput.java:327)
    	at com.univocity.parsers.csv.CsvParser.parseRecord(CsvParser.java:176)
    	at com.univocity.parsers.common.AbstractParser.parseNext(AbstractParser.java:560)
    	... 7 more
    

    (So you are using the uniVocity parsers. I'm familiar with them. But they aren't so "plug-and-play" as they make them out to be, as you can see here.)

    I searched the web for "Tornadoes.csv". One unrelated site implied that 2018_torn_prelim.csv might be more promising, but that didn't work. https://gist.github.com/darrenjaworski/5874227 mentioned a tornadoes.csv, and it was some Google spreadsheet, which I downloaded to a CSV file, but that gave me:

    Exception in thread "main" java.lang.IllegalArgumentException: Cannot add column with duplicate name Short column: state number to table tornadoes.csv
    	at tech.tablesaw.api.Table.validateColumn(Table.java:161)
    	at tech.tablesaw.api.Table.addColumns(Table.java:144)
    	at tech.tablesaw.io.csv.CsvReader.read(CsvReader.java:147)
    	at tech.tablesaw.io.DataFrameReader.csv(DataFrameReader.java:62)
    	at tech.tablesaw.io.DataFrameReader.csv(DataFrameReader.java:58)
    	at tech.tablesaw.io.DataFrameReader.csv(DataFrameReader.java:34)
    	…
    

    I finally realized you that this GitHub repository has some "tornado" CSV files. I downloaded and renamed tornadoes 1950-2014.csv, but no go:

    Exception in thread "main" tech.tablesaw.io.csv.AddCellToColumnException: Error while adding cell from row 41658 and column Crop Loss(position:10): Error adding value to column Crop Loss: For input string: "0.4"
    	at tech.tablesaw.io.csv.CsvReader.addRows(CsvReader.java:244)
    	at tech.tablesaw.io.csv.CsvReader.read(CsvReader.java:156)
    	at tech.tablesaw.io.DataFrameReader.csv(DataFrameReader.java:62)
    	at tech.tablesaw.io.DataFrameReader.csv(DataFrameReader.java:58)
    	at tech.tablesaw.io.DataFrameReader.csv(DataFrameReader.java:34)
    	…
    Caused by: java.lang.NumberFormatException: Error adding value to column Crop Loss: For input string: "0.4"
    	at tech.tablesaw.api.ShortColumn.appendCell(ShortColumn.java:353)
    	at tech.tablesaw.api.ShortColumn.appendCell(ShortColumn.java:27)
    	at tech.tablesaw.io.csv.CsvReader.addRows(CsvReader.java:242)
    	... 5 more
    

    This is frustrating. This is what a new user is faced with.

    If this turns out to be a good library, believe me I'll pitch in and help with the documentation and probably even the code. But I'm stuck just in the first lines! Could someone help me?

    opened by garretwilson 17
  • Join prevented on some column types

    Join prevented on some column types

    @benmccann My app used to be able to join on a DoubleColumn, but now as I'm integrating recent changes into it, I'm seeing a failure that the DoubleColumn I'm trying to join on can't be cast to a CategoricalColumn. I know I had the join working in the past. At what point was that ability changed/restricted? Is there any workaround to allow a join on a DoubleColumn?

    Sent with GitHawk

    opened by gregorco 17
  • Added support for reading and writing Apache ORC file format

    Added support for reading and writing Apache ORC file format

    Thanks for contributing.

    Description

    1. Added support for reading and writing Apache ORC file format Fixes #620

    Testing

    Unit Test cases added

    opened by murtuza-ranapur 17
  • Columns of arrays?

    Columns of arrays?

    I have a need for columns of vectors.

    It looks like it should be possible to implement my own by extending AbstractColumn, but, before I get too far into that, is this functionality already implemented elsewhere?

    opened by SeanU 4
  • Add method to parse string columns into list of string columns

    Add method to parse string columns into list of string columns

    One of the most needed functions in data analysis is the operation among multiple columns and generation of new columns (and rows). The need can be abstracted to a unified method like List<Column> columnOperate(List<Column>) to accomplish inter-column operation tasks. But now, I encountered an inter-column operation problem which cannot be solved efficiently and elegantly using only a few methods. In fact I found I couldn't solve this basic need using methods given in tablesaw. Details is provided below.

    Let's say I have a Table named "df" with two columns "multi_ratio" and "amount".

              df          
     amount  |  multi_ratio  |
    --------------------------
        100  |      0.8,0.2  |
        200  |      0.5,0.5  |
    

    Now I need to

    1. split every value in col "multi_ratio" into multiple values (e.g., convert "0.8,0.2" to List of 0.8, 0.2) ,
    2. amount * multi_ratio (split) (e.g., 100 * List of 0.8, 0.2 -> List of 80, 20) ,
    3. result of 2. expanded to multiple rows.

    So the final result I need would be.

                          df2                       
     amount  |  multi_ratio_single  |  multiply_result  |
    -----------------------------------------------------
        100  |                 0.8  |               80  |
        100  |                 0.2  |               20  |
        200  |                 0.5  |              100  |
        200  |                 0.5  |              100  |
    

    To achieve this goal, I firstly make an empty copy of df and add empty columns.

    Table df2 = df.emptyCopy();
    
    df2.addColumns(
            StringColumn.create("multi_ratio_single"),
            DoubleColumn.create("result")
    );
    

    Then, I tried to operate on each row of df, to generate new rows and add them to df2, but it seems just not work.

    Does anybody have suggests to make this happen efficiently?

    enhancement core 
    opened by eric-liuyd 1
  • Shallow cloning on travis

    Shallow cloning on travis

    opened by YunLemon 0
  • Be tolerant to cell formatter problems

    Be tolerant to cell formatter problems

    Description

    After the changes introduced in 74a54aa573a0a881add7a637a4d5f80941520d2e where CellFormatters were used to convert types, some excels had format options that make POI throw errors without being able to parse the entire sheet. In that cases, it's safer to log the error and discard the value that can't be correctly interpreted and converted

    opened by lujop 6
  • Can not get Integer values from sqlite database

    Can not get Integer values from sqlite database

    Hi I have a database in SQLite which stores very large numeric values in Integer type columns (below screenshot)

    image

    The problem with Jtablesaw is that it cant read correct values from database.

    here is the results. as you can see, the large numeric values are converted into negative numbers

    image

    how can I solve this problem?

    opened by dddeveloperrr 3
  • module bug

    module bug

    I use jdk 14.

    here is my code and the error message:

    package jtablesaw;
    
    import java.io.IOException;
    import tech.tablesaw.api.DoubleColumn;
    import tech.tablesaw.api.StringColumn;
    import tech.tablesaw.api.Table;
    import tech.tablesaw.selection.Selection;
    
    public class Main
    	{
    
    		public static void main(String[] args) throws IOException
    			{
    				
    				String[] animals = {"bear", "cat", "giraffe"};
    				double[] cuteness = {90.1, 84.3, 99.7};
    				Table cuteAnimals = Table.create("Cute Animals").addColumns( StringColumn.create("Animal types", animals), DoubleColumn.create("rating", cuteness));
    			}
    	}
    
    
    

    Output:

    Exception in thread "main" java.lang.ExceptionInInitializerError
    	at io.github.classgraph.ScanResult.init(ScanResult.java:214)
    	at io.github.classgraph.ClassGraph.<init>(ClassGraph.java:87)
    	at [email protected]/tech.tablesaw.api.Table.autoRegisterReadersAndWriters(Table.java:122)
    	at [email protected]/tech.tablesaw.api.Table.<clinit>(Table.java:75)
    	at jtablesaw/jtablesaw.Main.main(Main.java:23)
    Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make public void jdk.internal.misc.Unsafe.invokeCleaner(java.nio.ByteBuffer) accessible: module java.base does not "exports jdk.internal.misc" to unnamed module @41975e01
    	at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:349)
    	at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:289)
    	at java.base/java.lang.reflect.Method.checkCanSetAccessible(Method.java:196)
    	at java.base/java.lang.reflect.Method.setAccessible(Method.java:190)
    	at nonapi.io.github.classgraph.utils.FileUtils.lookupCleanMethodPrivileged(FileUtils.java:587)
    	at nonapi.io.github.classgraph.utils.FileUtils.access$000(FileUtils.java:55)
    	at nonapi.io.github.classgraph.utils.FileUtils$2.run(FileUtils.java:607)
    	at java.base/java.security.AccessController.doPrivileged(AccessController.java:312)
    	at nonapi.io.github.classgraph.utils.FileUtils.<clinit>(FileUtils.java:604)
    
    opened by dddeveloperrr 0
  • Could you support .xls file?

    Could you support .xls file?

    opened by zo-zero-one 1
  • large integral numbers in Oracle Number column are imported into Tablesaw DoubleColumns

    large integral numbers in Oracle Number column are imported into Tablesaw DoubleColumns

    In theory, the code below should ensure that it is mapped appropriately, but this appears not to always work. It may be the case that the column was defined with Scale == Null. More investigation is needed.

    if (scale == 0) {
            /* Mapping to java integer types based on integer precision defined:
    
            Java type           TypeMinVal              TypeMaxVal          p               MaxIntVal
            -----------------------------------------------------------------------------------------
            byte, Byte:         -128                    127                 NUMBER(2)       99
            short, Short:       -32768                  32767               NUMBER(4)       9_999
            int, Integer:       -2147483648             2147483647          NUMBER(9)       999_999_999
            long, Long:         -9223372036854775808    9223372036854775807 NUMBER(18)      999_999_999_999_999_999
    
            */
            if (precision > 0) {
              if (precision <= 4) {
                // Start with SHORT (since ColumnType.BYTE isn't supported yet)
                // and find the smallest java integer type that fits
                type = ColumnType.SHORT;
              } else if (precision <= 9) {
                type = ColumnType.INTEGER;
              } else if (precision <= 18) {
                type = ColumnType.LONG;
              }
            }
      }
    
    
    opened by lwhite1 0
  • InstantColumn imported from Oracle Date column can change the date (and timezone)

    InstantColumn imported from Oracle Date column can change the date (and timezone)

    For example: In DB as displayed in DBeaver 2021-01-09 20:38:58

    As displayed in Tablesaw instant column 2021-01-10T01:38:58.000Z

    There doesn't appear to be any way to define the DB timezone, or to import as a LocalDateTime column

    opened by lwhite1 0
  • Excel reader stops parsing table on an empty row

    Excel reader stops parsing table on an empty row

    If a table in excel has a row that hasn't any data, the current reader implementation interprets that is the end of the table and doesn't read next rows.

    I understand that this can be the desired behavior in some cases where after the blank row it follows another information that is not part of the table. But I don't think that this is the expected behavior.

    It will be nice to have at least a parameter in reader options like 'endTableParsingOnFirstBlankRow'

    opened by lujop 6
Releases(v0.38.5)
  • v0.38.5(Sep 6, 2021)

    Small release with one important bug fix. There is also a documentation enhancement.

    Bug fixes

    @lwhite1 SliceGroup TextColumn handling revision (#990). Fixes issue where splitting a large file on a TextColumn (as when using groups in aggregations) could cause a major increase in memory.

    Enhancements

    @dependabot Bump jsoup from 1.12.1 to 1.14.2 in /html (#977)

    @lwhite1 Allow TextColumn to append StringColumns, and vice-versa (#983)

    @lwhite1 made all data fields protected (#991)

    Documentation

    @lwhite1 Update gettingstarted.md

    Source code(tar.gz)
    Source code(zip)
  • v0.38.4(Aug 21, 2021)

    This is a relatively small release with a few nice enhancements and several bug fixes. There is also a documentation enhancement publicizing @ccleva's Parquet integration project.

    Bug fixes

    Fix bug where missing values in numeric columns could not be formatted. This enables arbitrary missing value indicators (e.g. "N/A") to be used in printing and saving to files. @lwhite1

    Replace parallelQuickSort with mergeSort (#968), to avoid incorrect sorting caused by race conditions when a custom sort object is used. @lwhite1

    fix issue #963 (#967) Relation.structure() fails for TableSlice with ClassCastException @lwhite1

    Enhancements

    Aggregate by non-primitive column type that extends Number (#973), making it possible to add a column type for BigDecimal @daugasauron @kallur

    plotly - added range slider to Axis (#953) … @smpawlowski

    To support annotation in plot.ly javascript. (#944) … @xcjusuih

    Documentation

    Added link to the tablesaw-parquet project in README (#966) @ccleva

    Source code(tar.gz)
    Source code(zip)
  • v0.38.3(Jul 22, 2021)

    Features

    • Improved Printformatting (#914)
    • Improve innerJoin performance (#903) - Thanks, @DanielMao1
    • Allow reading of malformed CSV (#901) - Thanks, @ChangzhenZhang
    • Fixes #822 and #815 providing more extensive columntype options - Thanks, @lujop
    • Support multiple custom missing value options in Readers
    • Allow default parsing to be overridden per column. (#928) - Thanks, @jglogan
    • Add contour plot (#938) - Thanks, @ArslanaWu
    • Add support for violin plot (#936) - Thanks, @LUUUAN
    • Add options for keep join keys in both table when appling outer join - Thanks, @DanielMao1
    • Assign FixedWidthWriterSettings.fieldLengths (#943) - Thanks, @Kerwinooooo
    • Support percentage by parsing percentage to double (#906) - Thanks, @Kerwinooooo
    • Open default local browser on an arbitrary HTML page #860 (#949)

    Bug Fixes

    • Fix bug of leftOuter join when using multi-tables (#905) Thanks @Carl-Rabbit
    • fix bug in appendCell that caused custom parser to be ignored (#912)
    • Corrected surefire plugin argLine (#915) Thanks @ccleva
    • Fix CI on Windows (#904) @lujop
    • Fix implementation of append(String) in TextColumn (#921)
    • Fix rightOuter join on multiple tables (#922) Thanks (again) @Carl-Rabbit
    • Fix XlsxReader doesn't respect calculated tableArea for header column names #887 (#926) - Thanks @lujop
    • Remove print statements in tests writing to system.out (#933)
    • Fix Column Type detection #751 and Integer handling in XlsxReader #882 (#931) - Thanks, @lujop
    • Fix(table): method 'where' apply 2 times selection function (#962) - Thanks, @zhiwen95
    • Support for not closing the output stream opened by user (#941) Thanks, @Kerwinooooo, @ChangzhenZhang

    Documentation

    • Update README.md (#917)

    Misc

    • Bump guava from 28.2-jre to 29.0-jre in /core (#895)
    • Bump guava version again for security improvements (#932)
    Source code(tar.gz)
    Source code(zip)
  • v0.38.1(May 9, 2020)

    Features

    • More options for creating a bubble plot (https://github.com/jtablesaw/tablesaw/pull/781) - thanks @rayeaster

    Bug Fixes

    • Fix support for java.sql.Time (https://github.com/jtablesaw/tablesaw/pull/791) - thanks @brainbytes42
    • Allow empty slices when aggregating (https://github.com/jtablesaw/tablesaw/pull/795) - thanks @emillynge
    • Fix NPE in ColumnType.compare (https://github.com/jtablesaw/tablesaw/pull/799)
    • Fix NPE in set (https://github.com/jtablesaw/tablesaw/pull/800)
    Source code(tar.gz)
    Source code(zip)
  • v0.38.0(Apr 13, 2020)

    Features

    • ignoreZeroDecimal option when reading data (https://github.com/jtablesaw/tablesaw/pull/748) - Thanks @larshelge
    • indexOf method (https://github.com/jtablesaw/tablesaw/pull/787) - Thanks @islaterm
    • Ability to add quotes to CSV even if not strictly required (https://github.com/jtablesaw/tablesaw/pull/767)
    • Ability to set layout and config for plots (https://github.com/jtablesaw/tablesaw/pull/690)
    • Pie chart subplots (https://github.com/jtablesaw/tablesaw/pull/777)
    • Plotting of Instant data (https://github.com/jtablesaw/tablesaw/pull/765)
    • Include sheet name when reading from Excel (https://github.com/jtablesaw/tablesaw/pull/758) - Thanks @R1j1t

    Bug Fixes

    • Joining an empty table (https://github.com/jtablesaw/tablesaw/pull/783) - Thanks @vanderzee-anl-gov
    • Use same options for reading and writing a CSV by default (https://github.com/jtablesaw/tablesaw/pull/772)
    • Reading of binary data from database
    • Make DoubleColumn.create work on wider range of input
    • Fix column sorting (https://github.com/jtablesaw/tablesaw/pull/778)
    • Fixed equals method on BooleanColumn (https://github.com/jtablesaw/tablesaw/pull/766)
    • Fixed 3D scatter plot (https://github.com/jtablesaw/tablesaw/pull/764)
    • Fixed BoxBuilder (https://github.com/jtablesaw/tablesaw/pull/763)
    • Make Component.engine non-static (https://github.com/jtablesaw/tablesaw/pull/762)
    • Fixed shaded jar
    • Improved handling of missing values when calling get on a column

    Documentation

    • Fix broken link to data import docs (https://github.com/jtablesaw/tablesaw/pull/773) - Thanks @bantu
    • Add docs for reading from Excel (https://github.com/jtablesaw/tablesaw/pull/759) - Thanks @R1j1t
    • Fixed CSV reading docs (https://github.com/jtablesaw/tablesaw/commit/6fc6a4d013e4cb92b5a8dc13d2d5e2fc62ec1460) - Thanks @salticus
    Source code(tar.gz)
    Source code(zip)
  • v0.37.2(Jan 24, 2020)

  • v0.37.1(Jan 24, 2020)

    Breaking Changes

    • Table.summary now returns a Table instead of a String - Thanks @jackie-h

    Features

    • Table transpose https://github.com/jtablesaw/tablesaw/commit/1b01eaf5c94c8a51d09be7fe2c080a78dc9a03e1 - Thanks @jackie-h
    • Added ability to sample rows while reading a CSV - Thanks @aecio
    • Additional Column and Table create methods

    Cleanup

    • Fixed a bunch of SonarCloud warnings
    • Improved exception message for duplicate Table columns
    • Validation for Table joins
    Source code(tar.gz)
    Source code(zip)
  • v0.37.0(Jan 8, 2020)

    Features

    • Upgraded to Smile 2.0 (https://github.com/jtablesaw/tablesaw/pull/735)
    • Autocorrelation (https://github.com/jtablesaw/tablesaw/pull/726)
    • InstantColumn min and max (https://github.com/jtablesaw/tablesaw/pull/719)
    • Enhancements to histogram (https://github.com/jtablesaw/tablesaw/pull/700)
    • New Column.map method (https://github.com/jtablesaw/tablesaw/pull/705)
    • Expose two FileReader methods (https://github.com/jtablesaw/tablesaw/pull/701)
    • New Plotly config argument (https://github.com/jtablesaw/tablesaw/pull/691)
    • Read specific Excel sheet (https://github.com/jtablesaw/tablesaw/pull/683)
    • Read JSON subtree (https://github.com/jtablesaw/tablesaw/pull/684)
    • Read specific HTML table (https://github.com/jtablesaw/tablesaw/pull/682)

    Bug Fixes

    • Only set LayoutBuilder.autosize if necessary (https://github.com/jtablesaw/tablesaw/pull/713)
    Source code(tar.gz)
    Source code(zip)
  • v0.36.0(Sep 29, 2019)

    Breaking changes

    • Table.numberColumn now returns NumericColumn instead of NumberColumn (https://github.com/jtablesaw/tablesaw/pull/669)

    Features

    • Interpolation of missing cells (https://github.com/jtablesaw/tablesaw/pull/664)
    • File encoding detection (https://github.com/jtablesaw/tablesaw/pull/654)
    • stdDev for rolling columns (https://github.com/jtablesaw/tablesaw/pull/666)
    • Column UI widget in BeakerX (https://github.com/jtablesaw/tablesaw/pull/668)
    • Additional replaceColumn method (https://github.com/jtablesaw/tablesaw/pull/673)

    Bug Fixes

    • Fix reading CSV files with space at edge of column name (https://github.com/jtablesaw/tablesaw/pull/659)
    • Fix ignoreLeadingWhitespace (https://github.com/jtablesaw/tablesaw/commit/fb207104725eb20a5038b29e7c8828b754d4f36d)
    • Fix handling of boolean columns in SawWriter (https://github.com/jtablesaw/tablesaw/pull/661)
    Source code(tar.gz)
    Source code(zip)
  • v0.35.0(Sep 3, 2019)

    Deprecations and breaking changes

    • Deprecated data() methods (https://github.com/jtablesaw/tablesaw/pull/649)
    • Renamed isMissingValue to valueIsMissing (https://github.com/jtablesaw/tablesaw/pull/643)
    • Removed mapToType added in last release (https://github.com/jtablesaw/tablesaw/pull/583)

    Features

    • Analytic Query functions (https://github.com/jtablesaw/tablesaw/pull/606 and https://github.com/jtablesaw/tablesaw/pull/621)
    • Deferred execution queries (https://github.com/jtablesaw/tablesaw/pull/574)
    • Saw file format persistence (https://github.com/jtablesaw/tablesaw/pull/642)
    • Column creation from streams (https://github.com/jtablesaw/tablesaw/pull/634)
    • Improved reading from URL (https://github.com/jtablesaw/tablesaw/pull/650)
    • remainder, capitalize, repeat, and concatenate functions (https://github.com/jtablesaw/tablesaw/pull/635)
    • Figure.builder (https://github.com/jtablesaw/tablesaw/pull/608)
    • Option to ignore whitespace in csv writer (https://github.com/jtablesaw/tablesaw/pull/605 - thanks @sd1998)

    Performance

    • Speed up joins (https://github.com/jtablesaw/tablesaw/pull/562)
    • Speed up TextColumn's isIn method (https://github.com/jtablesaw/tablesaw/pull/613)

    Bug fixes

    • Fix NPE when reading incomplete JSON rows (https://github.com/jtablesaw/tablesaw/pull/591)
    • Make empty columns be of type string (https://github.com/jtablesaw/tablesaw/pull/626)
    • Include missing values in unique (https://github.com/jtablesaw/tablesaw/pull/595)
    • Fix conversion of missing values in IntColumn.toDoubleColumn (https://github.com/jtablesaw/tablesaw/issues/577)
    • Fixed splitOn for TextColumn (https://github.com/jtablesaw/tablesaw/issues/554)
    • Handling of null values in SqlResultSetReader (https://github.com/jtablesaw/tablesaw/pull/563)

    Documentation

    • Began compiling code samples in docs (https://github.com/jtablesaw/tablesaw/pull/637, https://github.com/jtablesaw/tablesaw/pull/639, and https://github.com/jtablesaw/tablesaw/pull/641)

    Development

    • Automatically format code (https://github.com/jtablesaw/tablesaw/pull/570 and https://github.com/jtablesaw/tablesaw/pull/568)
    Source code(tar.gz)
    Source code(zip)
  • v0.34.2(Aug 1, 2019)

    Features

    • Add table.stream (https://github.com/jtablesaw/tablesaw/pull/540)
    • Add fillWith(double) (https://github.com/jtablesaw/tablesaw/pull/539)
    • Add mapToType (https://github.com/jtablesaw/tablesaw/pull/545) - Thanks @ryancerf
    • Add appendRow (https://github.com/jtablesaw/tablesaw/commit/6f98623d81d0e57d0cc5e9ab622b518165f7a74d)
    • Subplots (https://github.com/jtablesaw/tablesaw/pull/548) - Thanks @kiamesdavies
    • QQ plots and related improvements
    • plotly events (https://github.com/jtablesaw/tablesaw/pull/512) - Thanks @tmrn411

    Bug Fixes

    • Data export to Smile (https://github.com/jtablesaw/tablesaw/pull/528) - Thanks @kiamesdavies
    • Unit tests on Windows (https://github.com/jtablesaw/tablesaw/pull/546) - Thanks @paulk-asert
    • Calculation of unique values in string columns (https://github.com/jtablesaw/tablesaw/pull/544) - Thanks @ccleva
    • Ensure tests are run (https://github.com/jtablesaw/tablesaw/pull/551) - Thanks @ccleva
    • asObjectArray in numeric columns (https://github.com/jtablesaw/tablesaw/commit/6f9086897b6e85c482f5f4de3bcb71c4ae53295a)
    • Possible exception in toString (https://github.com/jtablesaw/tablesaw/pull/497) - Thanks @hallvard
    Source code(tar.gz)
    Source code(zip)
  • v0.34.1(Jun 16, 2019)

    Features

    • Improved RollingColumn support
    • Option for CSV quote character (https://github.com/jtablesaw/tablesaw/pull/536)
    • New dropRange and inRange methods (#534)
    • Improved NumberPredicates (#532)

    Bug Fixes

    • Fix DoubleColumn.map (https://github.com/jtablesaw/tablesaw/pull/533)
    Source code(tar.gz)
    Source code(zip)
  • v0.34.0(Jun 5, 2019)

    Breaking changes

    • Renamed join to joinOn so that it will work with Groovy (https://github.com/jtablesaw/tablesaw/pull/531)

    Features

    • Added set with predicate method (https://github.com/jtablesaw/tablesaw/pull/530)
    Source code(tar.gz)
    Source code(zip)
  • v0.33.5(Jun 4, 2019)

  • v0.33.4(Jun 4, 2019)

  • v0.33.3(Jun 4, 2019)

  • v0.33.2(Jun 4, 2019)

  • v0.33.1(Jun 4, 2019)

  • v0.33.0(Jun 4, 2019)

    Features

    • Add InstantColumn (https://github.com/jtablesaw/tablesaw/pull/518)
    • More configurable column type detection (https://github.com/jtablesaw/tablesaw/pull/521)
    • Added option for turning off html escaping in html table output
    • Additional date parsing capabilities (https://github.com/jtablesaw/tablesaw/issues/506)

    Fixes

    • Fix for precision of 0 in JdbcResultSet (https://github.com/jtablesaw/tablesaw/pull/523)

    Cleanup

    • Remove circular dependency between reader packages and core package
    • Remove unused epoch conversion methods (https://github.com/jtablesaw/tablesaw/pull/513)
    Source code(tar.gz)
    Source code(zip)
  • v0.32.7(Mar 31, 2019)

    • Implemented maxCharsPerColumn CSV parser setting
    • Fix DateTimeParser issue
    • Switch from reflections to classgraph
    • Updated pebble version in jsplot
    Source code(tar.gz)
    Source code(zip)
  • v0.32.6(Mar 24, 2019)

  • v0.32.5(Mar 24, 2019)

  • v0.32.4(Mar 23, 2019)

  • v0.32.3(Mar 13, 2019)

  • v0.32.2(Mar 13, 2019)

  • v0.32.1(Mar 13, 2019)

  • v0.32.0(Mar 8, 2019)

    Major features

    • XLSX support (https://github.com/jtablesaw/tablesaw/pull/470). Thanks @hallvard
    • New optional module structure, which was unfortunately broken (https://github.com/jtablesaw/tablesaw/pull/475)

    Enhancements

    • Implemented plot.ly's categoryOrder
    • Support comment character in CSV files (https://github.com/jtablesaw/tablesaw/pull/483). Thanks @jmcgonegal
    • Plot.ly template customization (#482). Thanks @hallvard
    • Configure shade plugin to create OSGi bundles (https://github.com/jtablesaw/tablesaw/pull/481). Thanks @hallvard

    Bug fixes

    • Fixed stepWithRows

    Development

    • Upgrade to junit 5 (https://github.com/jtablesaw/tablesaw/pull/476)
    Source code(tar.gz)
    Source code(zip)
  • v0.31.0(Feb 19, 2019)

    • Fat columns returned from file readers by default
    • Fixed-width file support (Thanks @Sparow199)
    • Improved marker support
    • Support for custom color bars on scatter charts with scaled colors
    • Additional table creation validation
    • Fixed numeric to string conversion
    Source code(tar.gz)
    Source code(zip)
  • v0.30.4(Feb 9, 2019)

  • v0.30.3(Feb 1, 2019)

    • Fix for NullPointerException in Table.read().csv()
    • Fix for marker color
    • Minor join performance improvements
    • Fix for Windows line ending issues
    • Fix for reading SQL result set
    • Allow pie charts to take floating points
    • Fixed sorting in TimeSeriesPlot
    Source code(tar.gz)
    Source code(zip)
Owner
Tablesaw
Maintainers for the Java Tablesaw application
Tablesaw
Master repository for the JGraphT project

JGraphT Released: June 14, 2020 Written by Barak Naveh and Contributors (C) Copyright 2003-2020, by Barak Naveh and Contributors. All rights reserved.

JGraphT 2k Sep 13, 2021
A scientific charting library focused on performance optimised real-time data visualisation at 25 Hz update rates for data sets with a few 10 thousand up to 5 million data points.

ChartFx ChartFx is a scientific charting library developed at GSI for FAIR with focus on performance optimised real-time data visualisation at 25 Hz u

GSI CS-CO/ACO 292 Sep 17, 2021
A 3D chart library for Java applications (JavaFX, Swing or server-side).

Orson Charts (C)opyright 2013-2020, by Object Refinery Limited. All rights reserved. Version 2.0, 15 March 2020. Overview Orson Charts is a 3D chart l

David Gilbert 81 Aug 28, 2021
The Mines Java Toolkit

The Mines Java Toolkit The Mines Java Toolkit (Mines JTK) is a set of Java packages and native (non-Java) software libraries for science and engineeri

Mines Java Toolkit 48 Aug 25, 2021
XChart is a light-weight Java library for plotting data.

XChart XChart is a light weight Java library for plotting data. Description XChart is a light-weight and convenient library for plotting data designed

Knowm 1.2k Sep 6, 2021
The Next Generation Logic Library

Introduction LogicNG is a Java Library for creating, manipulating and solving Boolean and Pseudo-Boolean formulas. It includes 100% Java implementatio

LogicNG 72 Aug 27, 2021
JGraphX - Library for visualizing (mainly Swing) and interacting with node-edge graphs.

JGraphX This project is end of life. We don't properly support Maven or publish to Maven Central. If that's an issue, use https://github.com/vlsi/jgra

JGraph 625 Sep 13, 2021
modular and modern graph-theory algorithms framework in Java

Erdos is a very light, modular and super easy to use modern Graph theoretic algorithms framework for Java. It contains graph algorithms that you can a

Erdos 105 Jul 28, 2021
Small project to plot the Mandelbrot set.

mandelbrot This project visualizes the Mandelbrot set and has a couple of methods to "move around" in the complex number area of it. The idea for this

null 3 Sep 10, 2021