Edit
After a long process of roaming the web, re-runs and troubleshoot the script with this wonderful community, the script is functional and does what it’s intended to do. The script itself is probably even further improvable in terms of efficiency/logic, but I lack the necessary skills/knowledge to do so, feel free to copy, edit or even propose a more efficient way of doing the same thing.
I’m greatly thankful to @AernaLingus@hexbear.net, @GenderNeutralBro@lemmy.sdf.org, @hydroptic@sopuli.xyz and Phil Harvey (exiftool) for their help, time and all the great idea’s (and spoon-feeding me with simple and comprehensive examples ! )
How to use
Prerequisites:
parallel
package installed on your distribution
Copy/past the below script in a file and make it executable. Change the start_range/end_range
to your needs and install the parallel
package depending on your OS and run the following command:
time find /path/to/your/image/directory/ -type f | parallel ./script-name.sh
This will order only the pictures from your specified time range into the following structure YEAR/MONTH
in your current directory from 5 different time tag/timestamps (DateTimeOriginal, CreateDate, FileModifyDate, ModifyDate, DateAcquired).
You may want to swap ModifyDate
and FileModifyDate
in the script, because ModifyDate
is more accurate in a sense that FileModifyDate
is easily changeable (as soon as you make some modification to the pictures, this will change to your current date). I needed that order for my specific use case.
From:
'-directory<$DateAcquired/' '-directory<$ModifyDate/' '-directory<$FileModifyDate/' '-directory<$CreateDate/' '-directory<$DateTimeOriginal/'
To:
'-directory<$DateAcquired/' '-directory<$FileModifyDate/' '-directory<$ModifyDate/' '-directory<$CreateDate/' '-directory<$DateTimeOriginal/'
As per exfitool’s documentation:
ExifTool evaluates the command-line arguments left to right, and latter assignments to the same tag override earlier ones.
#!/bin/bash
if [ $# -eq 0 ]; then
echo "Usage: $0 <filename>"
exit 1
fi
# Concatenate all arguments into one string for the filename, so calling "./script.sh /path/with spaces.jpg" should work without quoting
filename="$*"
start_range=20170101
end_range=20201230
FIRST_DATE=$(exiftool -m -d '%Y%m%d' -T -DateTimeOriginal -CreateDate -FileModifyDate -DateAcquired -ModifyDate "$filename" | tr -d '-' | awk '{print $1}')
if [[ "$FIRST_DATE" != '' ]] && [[ "$FIRST_DATE" -gt $start_range ]] && [[ "$FIRST_DATE" -lt $end_range ]]; then
exiftool -api QuickTimeUTC -d %Y/%B '-directory<$DateAcquired/' '-directory<$ModifyDate/' '-directory<$FileModifyDate/' '-directory<$CreateDate/' '-directory<$DateTimeOriginal/' '-FileName=%f%-c.%e' "$filename"
else
echo "Not in the specified time range"
fi
Hi everyone !
Please no bash-shaming
, I did my outmost best to somehow put everything together and make it somehow work without any prior bash programming knowledge. It took me a lot of effort and time.
While I’m pretty happy with the result, I find the execution time very slow: 16min for 2288 files
.
On a big folder with approximately 50,062 files, this would take over 6 hours !!!
If someone could have a look and give me some easy to understand hints, I would greatly appreciate it.
What Am I trying to achieve ?
Create a bash script that use exiftool
to stripe the date from images in a readable format (20240101) and compare it with an end_range
to order only images from that specific date range
(ex: 2020-01-01 -> 2020-12-30).
Also, some images lost some EXIF data, so I have to loop through specific time fields:
- DateTimeOriginal
- CreateDate
- FileModifyDate
- DateAcquired
The script in question
#!/bin/bash
shopt -s globstar
folder_name=/home/user/Pictures
start_range=20170101
end_range=20180130
for filename in $folder_name/**/*; do
if [[ $(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -DateTimeOriginal "$filename") =~ ^[0-9]+$ ]]; then
DateTimeOriginal=$(/usr/bin/vendor_perl/exiftool -d '%Y%m%d' -T -DateTimeOriginal "$filename")
if [ "$DateTimeOriginal" -gt $start_range ] && [ "$DateTimeOriginal" -lt $end_range ]; then
/usr/bin/vendor_perl/exiftool -api QuickTimeUTC -r -d %Y/%B '-directory<$DateTimeOriginal/' '-FileName=%f%-c.%e' "$filename"
echo "Found a value"
echo "Okay its $(tput setab 22)DateTimeOriginal$(tput sgr0)"
fi
elif [[ $(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -CreateDate "$filename") =~ ^[0-9]+$ ]]; then
CreateDate=$(/usr/bin/vendor_perl/exiftool -d '%Y%m%d' -T -CreateDate "$filename")
if [ "$CreateDate" -gt $start_range ] && [ "$CreateDate" -lt $end_range ]; then
/usr/bin/vendor_perl/exiftool -api QuickTimeUTC -r -d %Y/%B '-directory<$CreateDate/' '-FileName=%f%-c.%e' "$filename"
echo "Found a value"
echo "Okay its $(tput setab 27)CreateDate$(tput sgr0)"
fi
elif [[ $(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -FileModifyDate "$filename") =~ ^[0-9]+$ ]]; then
FileModifyDate=$(/usr/bin/vendor_perl/exiftool -d '%Y%m%d' -T -FileModifyDate "$filename")
if [ "$FileModifyDate" -gt $start_range ] && [ "$FileModifyDate" -lt $end_range ]; then
/usr/bin/vendor_perl/exiftool -api QuickTimeUTC -r -d %Y/%B '-directory<$FileModifyDate/' '-FileName=%f%-c.%e' "$filename"
echo "Found a value"
echo "Okay its $(tput setab 202)FileModifyDate$(tput sgr0)"
fi
elif [[ $(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -DateAcquired "$filename") =~ ^[0-9]+$ ]]; then
DateAcquired=$(/usr/bin/vendor_perl/exiftool -d '%Y%m%d' -T -DateAcquired "$filename")
if [ "$DateAcquired" -gt $start_range ] && [ "$DateAcquired" -lt $end_range ]; then
/usr/bin/vendor_perl/exiftool -api QuickTimeUTC -r -d %Y/%B '-directory<$DateAcquired/' '-FileName=%f%-c.%e' "$filename"
echo "Found a value"
echo "Okay its $(tput setab 172)DateAcquired(tput sgr0)"
fi
elif [[ $(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -ModifyDate "$filename") =~ ^[0-9]+$ ]]; then
ModifyDate=$(/usr/bin/vendor_perl/exiftool -d '%Y%m%d' -T -ModifyDate "$filename")
if [ "$ModifyDate" -gt $start_range ] && [ "$ModifyDate" -lt $end_range ]; then
/usr/bin/vendor_perl/exiftool -api QuickTimeUTC -r -d %Y/%B '-directory<$ModifyDate/' '-FileName=%f%-c.%e' "$filename"
echo "Found a value"
echo "Okay its $(tput setab 135)ModifyDate(tput sgr0)"
fi
else
echo "No EXIF field found"
done
Things I have tried
- Reducing the number of
if
calls
But it didn’t much improve the execution time (maybe a few ms?). The syntax looks way less readable but what I did, was to add a lot of or ( || ) in the syntax to reduce to a single if
call. It’s not finished, I just gave it a test drive with 2 EXIF fields (DateTimeOriginal and CreateDate) to see if it could somehow improve time. But meeeh :/.
#!/bin/bash
shopt -s globstar
folder_name=/home/user/Pictures
start_range=20170101
end_range=20201230
for filename in $folder_name/**/*; do
if [[ $(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -DateTimeOriginal "$filename") =~ ^[0-9]+$ ]] || [[ $(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -CreateDate "$filename") =~ ^[0-9]+$ ]]; then
DateTimeOriginal=$(/usr/bin/vendor_perl/exiftool -d '%Y%m%d' -T -DateTimeOriginal "$filename")
CreateDate=$(/usr/bin/vendor_perl/exiftool -d '%Y%m%d' -T -CreateDate "$filename")
if [ "$DateTimeOriginal" -gt $start_range ] && [ "$DateTimeOriginal" -lt $end_range ] || [ "$CreateDate" -gt $start_range ] && [ "$CreateDate" -lt $end_range ]; then
/usr/bin/vendor_perl/exiftool -api QuickTimeUTC -r -d %Y/%B '-directory<$DateTimeOriginal/' '-directory<$CreateDate/' '-FileName=%f%-c.%e' "$filename"
echo "Found a value"
echo "Okay its $(tput setab 22)DateTimeOriginal$(tput sgr0)"
else
echo "FINISH YOUR SYNTAX !!"
fi
fi
done
- Playing around with find
To recursively find my image files in all my folders I first tried the find
function, but that gave me a lot of headaches… When my image file name had some spaces in it, it just broke the image path strangely… And all answers I found on the web were gibberish, and I couldn’t make it work in my script properly… Lost over 4 yours only on that specific issue !
To overcome the hurdle someone suggest to use shopt -s globstar
with for filename in $folder_name/**/*
and this works perfectly. But I have no idea If this could be the culprit of slow execution time?
- Changing all
[ ]
into[[ ]]
That also didn’t do the trick.
How to Improve the processing time ?
I have no Idea if it’s related to my script or the exiftool call that makes the script so slow. This isn’t that much of a complicated script, I mean, it’s a comparison between 2 integers not a hashing of complex numbers.
I hope someone could guide me in the right direction :)
Thanks !
I have not tested this, but I have a couple ideas off the top of my head.
#1 - Retrieve all fields with a single exiftool command. e.g.
ALL_DATES=$(exiftool -m -d '%Y%m%d' -T -DateTimeOriginal -CreateDate -FileModifyDate -DateAcquired -ModifyDate "$filename")
Then retrieve individual fields from $ALL_DATES with something like awk. e.g.
echo $ALL_DATES | awk '{print $1}'
will return the first field (DateTimeOriginal), and changing that to ‘{print $2}’ will return the second field (CreateDate).#2 - Perhaps process multiple files with a single
exiftool
call. e.g.exiftool -m -d '%Y%m%d' -T -DateTimeOriginal -CreateDate -FileModifyDate -DateAcquired -ModifyDate ~/Pictures/*
. You might compare whether running just this exiftool query once vs running it in a loop takes a significantly different amount of time. If not, it’s probably simpler to use one call per file.Edit: I doubt the either
find
or globbing will use a significant amount of time, however, the issues you have withfind
and spaces in file names can be worked around by using find’s -print0 option. This prints out file paths separated by NUL bytes (i.e. ASCII value 0). You can then loop through them without needing to guess when whitespace is part of the path vs a delimiter. A common way of dealing with this is to pipe the output of find into xargs like so:find ~/Pictures -type f -print0 | xargs -0 -L 1 echo 'File path: '
. That will executeecho 'File path: ' <file>
for every file in your Pictures folder. It’s a little more complicated. You can also use a for loop like so:find ~/Pictures -type f -print0 | while IFS= read -r -d '' file_path; do echo "Processing: $file_path" done
Note that when you pass a blank string with
read -d ''
, it reads to a NUL char, as documented here: https://www.gnu.org/software/bash/manual/bash.html#index-read . I’m not 100% sure if this is true in older versions of Bash or other similar shells.This is probably the answer.
Op, it’s generally not native bash that’s slow, it’s the overhead of calling a fresh copy of exiftool each time. Especially if it’s written in Perl, where it needs to be freshly interpreted / compiled each time.
Hey thanks for the info :)
As everyone seems to suggest I need to find a way to call only once exiftool. But that’s out of my league right now :) I’m pretty happy with the 4min improvement by only changing how the script loops through my files.
Hello :) !
Thanks for the nice write up <3 ! Not only did it improve the processing speed but it also made my script way more readable <3 !
#1 - Retrieve all fields with a single exiftool command. e.g. ALL_DATES=$(exiftool -m -d '%Y%m%d' -T -DateTimeOriginal -CreateDate -FileModifyDate -DateAcquired -ModifyDate "$filename")
I went from
16min
for the same sample to:real 4m47,630s user 4m16,868s sys 0m32,487s
That’s like a BIG improvement !!! Thank you very much !!!
I don’t know if I did it how you had it in mind but here is a snipped of how I changed the script
#!/bin/bash shopt -s globstar folder_name=/home/user/Pictures start_range=20170101 end_range=20181230 for filename in $folder_name/**/*; do ALL_DATES=$(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -DateTimeOriginal -CreateDate -FileModifyDate -DateAcquired -ModifyDate "$filename") # echo $ALL_DATES | awk '{print $1}' if [[ $(echo $ALL_DATES | awk '{print $1}') =~ ^[0-9]+$ ]]; then if [[ $(echo $ALL_DATES | awk '{print $1}') -gt $start_range ]] && [[ $(echo $ALL_DATES | awk '{print $1}') -lt $end_range ]]; then /usr/bin/vendor_perl/exiftool -api QuickTimeUTC -r -d %Y/%B '-directory<$DateTimeOriginal/' '-FileName=%f%-c.%e' "$filename" echo "Okay its $(tput setab 22)DateTimeOriginal$(tput sgr0)" fi ...
While every file gets processed successfully I get A LOT of error from EXIF tool, it seems like something is still looping through the files while they have already been moved?? I will try to investigate why it throws this error.
Error: File not found - /home/user/Pictures/2018/September/IMG_2078.JPG Error: File not found - /home/user/Pictures/2018/September/IMG_2079.JPG Error: File not found - /home/user/Pictures/2018/September/IMG_2080.JPG Error: File not found - /home/user/Pictures/2018/September/IMG_2081.JPG Error: File not found - /home/user/Pictures/2018/September/IMG_2082.JPG Error: File not found - /home/user/Pictures/2018/September/IMG_2083.JPG Error: File not found - /home/user/Pictures/2018/September/IMG_2084.JPG Error: File not found - /home/user/Pictures/2018/September/IMG_2085.JPG Error: File not found - /home/user/Pictures/2018/September/IMG_2086.JPG Error: File not found - /home/user/Pictures/2018/September/IMG_2087.JPG
#2 - Perhaps process multiple files with a single exiftool call. e.g. exiftool -m -d '%Y%m%d' -T -DateTimeOriginal -CreateDate -FileModifyDate -DateAcquired -ModifyDate ~/Pictures/*. You might compare whether running just this exiftool query once vs running it in a loop takes a significantly different amount of time. If not, it’s probably simpler to use one call per file.
Hummm, I can’t think of where and how I could fit it in the script :/ Will probably have to do a lot of try&break.
Thank you very much for your help and insightful edit with
find
:).Ah yeah this suggestion is much better than mine. Reducing the number of calls to exiftool is probably the easiest way to speed things up, and giving it a glob instead of individual files is definitely worth trying because it can be quite a bit faster sometimes depending on what’s being run
Hey ha :) Sorry to pin you, I just wanted to give you a little update !
I was maybe a bit to hasty to cheer up. With the last updated script, It actually behaved strangely, some if statements were skipped, looped strangely through my folder and it wrongly ordered many files with the wrong Field. Also it raised many strange errors I couldn’t make any sense of while looking at the script…
HOWEVER !!
I changed how my script loops through my files according to your
find
suggestions:#!/bin/bash start_range=20160101 end_range=20221212 folder_name=/home/user/Pictures find ~/Pictures -type f -print0 | while IFS= read -r -d '' file_path; do image_path=$file_path ALL_DATES=$(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -DateTimeOriginal -CreateDate -FileModifyDate -DateAcquired -ModifyDate "$file_path") for images in $file_path; do if [[ $(echo $ALL_DATES | awk '{print $1}') =~ ^[0-9]+$ ]] && [[ $(echo $ALL_DATES | awk '{print $1}') -gt $start_range ]] && [[ $(echo $ALL_DATES | awk '{print $1}') -lt $end_range ]]; then /usr/bin/vendor_perl/exiftool -api QuickTimeUTC -d %Y/%B '-directory<$DateTimeOriginal/' '-FileName=%f%-c.%e' "$images" echo "Okay its $(tput setab 22)DateTimeOriginal$(tput sgr0)"
While it doesn’t give me the “BIG improvement” I dreamed yesterday, it does give a non-neglieable improvement ! Also there are no errors anymore and the script loops perfectly through all files… So it’s a win :D.
real 11m5,139s user 9m47,445s sys 1m16,224s
To further improve the script, I probably need to find a way to make only a single exiftool call, as you suggested, but that’s probably out of my league… Also I already lost enough amount of time to somehow put this together without any coding skills and just wrongly make use of a tool I have no clue about !!!
Thank you very much for your help :))))
Glad it’s working! Couple more quick ideas:
Since you’re looping through the results of
find
, $file_path will be a single path name, so you don’t need to loop over it withfor images in $file_path;
anymore.I think you’re checking each field of the results in its own
if
statement, e.g.if [[ $(echo $ALL_DATES | awk '{print $1}')...
thenif [[ $(echo $ALL_DATES | awk '{print $2}')...
etc. While I don’t think this is hurting performance significantly, it would make your code easier to read and maintain if you first found the correct date, and then did only one comparison operation on it.For example,
exiftool -m -d '%Y%m%d' -T -DateTimeOriginal -CreateDate -FileModifyDate -DateAcquired -ModifyDate "$file_path"
returns five columns, which contain either a date or “-”, and it looks like you’re using the first column that contains a valid date. You can try something like this to grab the first date more easily, then just use that from then on:FIRST_DATE=$(exiftool -m -d '%Y%m%d' -T -DateTimeOriginal -CreateDate -FileModifyDate -DateAcquired -ModifyDate "$file_path" | tr -d '-' | awk '{print $1}')
tr -d '-'
will delete all occurrences of ‘-’. That means the result will only contain whitespace and valid dates, soawk '{print $1}'
will print the first valid date. Then you can simply have oneif
statement:if [[ "$FIRST_DATE" != '' ]] && [[ "$FIRST_DATE" -gt $start_range ]] && [[ "$FIRST_DATE" -lt $end_range ]]; then
Hope this helps!
Update
I found something interesting ! It seems that the tag
FileModifyDate
is not being processed in the script! After removing all time tags exceptFileModifyDate
the file is not even processed, it directly goes to theelse
statement ! I’m still digging :DHey again !
Thank you very much for sharing your knowledge and ELI !
I gave it a try and while I understand what it does (thanks to your concise and easy to understand examples) and how it should be implemented exiftool seems to behave very strangely (maybe a bug, but I guess skill issue).
FIRST_DATE=$(exiftool -m -d '%Y%m%d' -T -DateTimeOriginal -CreateDate -FileModifyDate -DateAcquired -ModifyDate "$file_path" | tr -d '-' | awk '{print $1}') if [[ "$FIRST_DATE" != '' ]] && [[ "$FIRST_DATE" -gt $start_range ]] && [[ "$FIRST_DATE" -lt $end_range ]]; then /usr/bin/vendor_perl/exiftool -api QuickTimeUTC -d %Y/%B '-directory<$DateTimeOriginal/' '-directory<$CreateDate/' '-directory<$FileModifyDate/' '-directory<$DateAcquired/' '-directory<$ModifyDate/' '-FileName=%f%-c.%e' "$file_path"
As per the exiftool documentation (source example 12)
-directory<$DateTimeOriginal/' '-directory<$CreateDate/' '-directory<$FileModifyDate/' '-directory<$DateAcquired/' '-directory<$ModifyDate/'
However, it sometimes skip the
DateTimeOriginal
field and takesFileModifyDate
instead even if the first one is present. My guess is that exiftool needs more time to correctly process the file, but it’s only a guess ! Because with thefor
loop and allelif
calls it works without any issues.Thanks again for your insightful help :)
Side note: I gave a test run with only one time field to see If there is any time gain with calling only the first valid date, while it seems ~2ms slower per file I think it would really make the difference without all the
elif
calls on the long run !!Thanks again !!
Off the top of my head I’m not sure why that would be. To troubleshoot, it might help to print the output every step of the way so you can see if there are any oddities. Something like this perhaps, in place of the
FIRST_DATE=
line.echo $file_path EXIF_OUT=$(exiftool -m -d '%Y%m%d' -T -DateTimeOriginal -CreateDate -FileModifyDate -DateAcquired -ModifyDate "$file_path") echo "$EXIF_OUT" EXIF_FILTERED=$(echo "$EXIF_OUT" | tr -d '') echo "$EXIF_FILTERED" FIRST_DATE=$(echo "$EXIF_FILTERED" | awk '{print $1}') echo "$FIRST_DATE"
Heyho :)
Back again ! Thanks you for helping troubleshoot !!! It’s was actually something else… (my reading skills) !
That’s why everything was messed up, because it took the last assignment to write the directory date… I feel quite stupid/bad and added unnecessarily noise to Phil Harvey’s forum :/.
Thanks to you and another user I greatly improved my script and went from 16min to 1min21s for the exact same batch ! He/she proposed to use the
parallel
package alongside the script to make full use of my CPU cores ! Also, another way to loop through my files. It’s a mix of both your ideas and it looks so much better ! Thank you very much for your help ! 😘#!/bin/bash if [ $# -eq 0 ]; then echo "Usage: $0 <filename>" exit 1 fi # Concatenate all arguments into one string for the filename, so calling "./script.sh /path/with spaces.jpg" should work without quoting filename="$*" start_range=20170101 end_range=20201230 FIRST_DATE=$(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -DateTimeOriginal -CreateDate -FileModifyDate -DateAcquired -ModifyDate "$filename" | tr -d '-' | awk '{print $1}') #echo $FIRST_DATE if [[ "$FIRST_DATE" != '' ]] && [[ "$FIRST_DATE" -gt $start_range ]] && [[ "$FIRST_DATE" -lt $end_range ]]; then /usr/bin/vendor_perl/exiftool -api QuickTimeUTC -d %Y/%B '-directory<$DateAcquired/' '-directory<$ModifyDate/' '-directory<$FileModifyDate/' '-directory<$CreateDate/' '-directory<$DateTimeOriginal/' '-FileName=%f%-c.%e' "$filename" else echo "Error" fi
time find /home/user/Pictures/ -type f | parallel ./exif-test.bash