Optimize insert in MySql table – Java

Posted on

Question :

I have a loop making the sequential insert of millions of rows into a single MySql table.

I wonder if it’s possible to parallelize the insert or use some feature that increases insert performance.

Code:

 public static java.sql.Connection getConexaoMySQL() {       
    //atributo do tipo Connection 
    try { 
        String driverName = "com.mysql.jdbc.Driver"; 
        Class.forName(driverName);  
        String serverName = "localhost";  
        String mydatabase ="tweets"; 
        String url = "jdbc:mysql://" + serverName + "/" + mydatabase; String username = "root";  
        String password = "admin";  
        connection = DriverManager.getConnection(url, username, password); 
        if (connection != null) { 
            status = ("Banco de Dados--->Conectado com sucesso!"); 
        } 
        else { 
            status = ("Banco de Dados--->Não foi possivel realizar conexão"); 
        } 
        return connection; 
    } 
    catch (ClassNotFoundException e) 
    { 
        System.out.println("O driver expecificado nao foi encontrado."); 
        return null; 
    } catch (SQLException e) {  
        System.out.println("Nao foi possivel conectar ao Banco de Dados."); 
        return null; 
    } 
} 

public static void insert(List<TweetsDB> list){
    for (TweetsDB x : list) { 

        preparedStmt.setString (1, x.getCandidate);
        preparedStmt.setString (2, x.getIDTweet);
        preparedStmt.setString (3, x.getIDUser);
        preparedStmt.setString (4, x.getUserScreenName);
        preparedStmt.setString (5, x.getUserName);
        preparedStmt.setString (6, x.getRetweets);
        preparedStmt.setTimestamp(7, x.getDate);
        preparedStmt.setString (8, x.getText);
        preparedStmt.setString (9, x.getHashtags);

        // execute the preparedstatement
        preparedStmt.execute();
    }   
}

    

Answer :

If you have a looping and are using java you can separate the task into threads according to the amount of connection pool you can use in the database.

If you need to store the status of each insert this is the best option.

But if you guarantee that there are no errors besides the use of threads you can create a kind of buffer that sends 10k inserts at a time …

First run query SHOW VARIABLES;

Look for the connection limit variable, it should be max_connections

This is the limit on connections that the database can handle at the same time.

After this implement the threads in your java code

final int maxPool = 150;


public void executa(){
    final List<List<String>> listaTodosInserts = divideInserts(maxPool, "arquivoCSV");

    for (List<String> listaInserts : listaTodosInserts ){
        insert(listaInserts);
    }
}


public void insert(final List<String> listaInserts){
    new Thread() {
       public void run() {

           for (String insert : listaInserts){
               // preparedStmt.execute(insert);
           }
       }
    }.start();      
}

    

Leave a Reply

Your email address will not be published. Required fields are marked *