Question :
I’m trying to figure out a way to delete files that windows duplicates when making multiple copies. I was able to do something by creating the code below:
import java.util.*;
import java.io.*;
public class FileCreator{
public static void main(String[] args) throws Exception{
File f = f = new File(".");
File[] files = f.listFiles();
for(File fl : files){
String fileName = fl.getName();
if(fileName.contains("Copia - Copia")){
System.out.println(fileName);
}
}
}
}
I have created some files, as follows the print below:
Andtheresultwas:
C:UsersdiegoDesktopNova pasta>java FileCreator
File 0 - Copia - Copia.txt
File 10 - Copia - Copia.txt
File 12 - Copia - Copia.txt
File 14 - Copia - Copia.txt
File 16 - Copia - Copia.txt
File 18 - Copia - Copia.txt
File 2 - Copia - Copia.txt
File 4 - Copia - Copia.txt
File 6 - Copia - Copia.txt
File 8 - Copia - Copia.txt
This form even caters to me, since I just replace the text output of the condition within the loop with a simple fl.delete();
but I would like to have more control over what is deleted by using a regex.
I started to do something as below, but I could not create a regex that could detect ” Copia - Copia
” exactly at the end of the file name, and then delete it.
Pattern p = Pattern.compile("");
Matcher m;
f = new File(".");
File[] files = f.listFiles();
for(File fl : files){
String fileName = fl.getName();
m = p.matcher(fileName);
if(m.find()){
//fl.delete();
System.out.println(fileName + " deletado");
}
}
How do I make a regex that filters these files?
Note: detecting the extension is irrelevant, I only need to detect the Copia - Copia
which is how windows renames duplicates of duplicates, adding at the end of the file name.
Answer :
The regex can be thus Copia - Copia.[^.]+$
Explanation:
Copia - Copia.[^.]+$
^ ^ ^
1 2 3
The Copia - Copia.
is the part you want to find
[^.]
the sign of ^
if it is within [...]
indicates negation, ie any character within [^....]
will be ignored in match , then after the points I used it so that anything could be the file extension, minus another point.
The $
is what defines that the file name (the String
) should end exactly as it comes before, in case it should end with Copia - Copia.[qualquer extensão]
As an alternative you can use
C[oó]pia - C[oó]pia.[^.]+$
if there are situations with accents and no accents, note that it varies if it is unicode
The usage would look something like
final Pattern regex = Pattern.compile("C[oó]pia - C[oó]pia.[^.]+$");
An example with List<String>
to test:
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Exemplo
{
public static void main(String[] args)
{
final Pattern regex = Pattern.compile("Copia - Copia.[^.]+$");
List<String> files = new ArrayList<String>();
files.add("File 123 - Copia.txt");
files.add("File 10 - Copia - Copia.java");
files.add("File 12 - Copia.java");
files.add("File 14 - Copia - Copia.txt");
files.add("File 16 - Copia.txt");
files.add("File 18 - Copia - Copia.log");
files.add("File 2 - Copia.txt");
files.add("File 4 - Copia.log");
files.add("File 6 - Copia - Copia.txt");
files.add("File 8 - Copia.txt");
for (String file : files)
{
if (regex.matcher(file).find())
{
System.out.println("Encontrado: " + file);
}
}
}
}
Example on link
It is also possible to use String.matches
, but with it it will be necessary to add .*
up front, because for some reason it ignores if this is not done, thus .*Copia - Copia.[^.]+$
. However as @VictorStafusa said, maybe this might compromise performance a bit, depending on how many times you will run (I still can not confirm)
Explanation:
.*Copia - Copia.[^.]+$
^ ^ ^ ^
1 2 3 4
It would look something like:
for (String file : files)
{
if (file.matches("Copia - Copia.[^.]+$"))
{
System.out.println("Encontrado: " + file);
}
}
.*
looks for any (group of) characters that come before the desired text
The Copia - Copia.
is the part you want to find
[^.]
the sign of ^
if it is within [...]
indicates negation, ie any character within [^....]
will be ignored in the match, so after points I used it so anything can be the file extension, minus another point.
The $
is what defines that the file name (the String
) should end exactly as it comes before, in case it should end with [qualquer caractere]Copia - Copia.[qualquer extensão]
As an alternative you can use
.*C[oó]pia - C[oó]pia.[^.]+$
if there are situations with accents and no accents, note that it varies if it is unicode
The usage would look something like
if (file.matches("C[oó]pia - C[oó]pia.[^.]+$")) {
You can use the following regular expression:
Pattern.compile("Copia - Copia.[a-zA-Z]{3,4}$");
Where:
-
Copia - Copia
is the text you are looking for; -
.
is the literal character.
. The normal would be onlyhowever as the expression is in a
string
we have to escape it once; -
[a-zA-Z]
delimits that the character must be betweena
andz
orA
andZ
; -
{3, 4}
is related to the number of characters, which must be 3 or 4; -
$
means that it is at the end ofstring
;
That is:
Search for the text
Copia - Copia
followed by a.
, 3 or 4 letters froma
toZ
at the end of astring
;
Use this expression:
(Copy – Copy)
The parentheses define a group of characters to be ‘captured’ from the string.
Enter this site link to see it working.
String trechoParaRemover = "(Copia - Copia)";
fileName = filename.replace(trechoParaRemover,"");
That would solve your case, I do not think you need a regular expression or delete ().