Close
Stop stripping EXIF
I recently noticed that Yande.re appears to be near-unique in stripping EXIF from uploads.

According to https://yande.re/forum/show/20204, this is mainly around for historical reasons and may no longer be needed.

Can we get rid of it?
I think we should.

One of problem caused by this is the color profile
I actually thought it was removed already.
Context for those who don't know: https://twitter.com/yande_re/status/764834030097461248/photo/1
I think I heard you said in the channel: "the file is inferior to the original version of the file as it's been stripped of EXIF"

I won't go as far as saying it is inferior unless you give me a reason beside color profiling.
Checkmate said:
I think I heard you said in the channel: "the file is inferior to the original version of the file as it's been stripped of EXIF"

I won't go as far as saying it is inferior unless you give me a reason beside color profiling.
Stripping off the EXIF changes the MD5, which makes it harder to detect duplicates. There can also be useful stuff in the EXIF like the program the artist used to create the work, which some people may be interested in (and this isn't currently preserved in any form on Yande.re).

Personally, I consider it inferior mostly because it differs from the original for no good reason. If stripping the EXIF was beneficial in some way, for example making the file considerably smaller, I'd consider it an alternative to the original. A tradeoff makes sense to me. As it stands though, information is being irreversibly destroyed for no (remaining) apparent reason. The original has something the Yande.re version does not. No matter how small that thing is, it's missing.

Anyhow, colour profiling is a pretty good reason on its own.
jkdfahlkdhfjgakdhfg said:
Personally, I consider it inferior mostly because it differs from the original
We are the original, in other word: we are a scan site, not a pixiv dump.

So apart from the color data, I'd be opened to other reason to this.
fireattack said:
I think we should.

One of problem caused by this is the color profile
Is this what causes some images to get inverted colors when they are uploaded?
Checkmate said:
We are the original, in other word: we are a scan site, not a pixiv dump.
You may be the first host but you're not the original. The original is the file as uploaded by the creator. You're given the original but you don't preserve it.

So apart from the color data, I'd be opened to other reason to this.
Scanned images may have useful EXIF information like scan DPI, scanner model, color profiles and other such data. These could be used (for example) by someone who owns something scannable to see if they can upload a better version of something already here. It could also be used to spot issues in scans (wrong color profile, "fast" mode or something like that) so an uploader can correct them.

I believe though that this shouldn't be an argument for why we _shouldn't_ get rid of EXIF but why we _should_.

So far the only reasons I've heard for it are Animepaper problems that aren't a concern anymore and the potential of other sites watermarking scans.

As it stands it seems we're not only compromising the metadata but the image content itself (due to loss of color profile) to combat watermarking, which I'm not sure is a problem we're sure we have?

A compromise I'd like to propose is a simple "remove EXIF" checkbox on the upload page, defaulting to unchecked, perhaps with a message like "Check this if this image may contain sensitive metadata (your location, watermarks if rehosted from another site). Do not check this if the image has a non-sRGB color profile".
Yeah, DPI is another useful information. Thankfully we don't remove the one from PNGs.
blooregardo said:
Is this what causes some images to get inverted colors when they are uploaded?
Yeah, such as post #362405
Anyone else have any thoughts?
exif, ifd, and icc metadata could be useful to keep in the future to ensure proper image rendering

though other metadata like xmp, itpc, comments, and embedded thumbnail, are largely useless and can often use ~20-25KB of space when all filled

'jhead -dc -di -dx -dt' would probably do the trick and keep only the more useful metadata which is human readable
With the opinion of jkdfahlkdhfjgakdhfg considering the quality being inferior simply because the md5 does not match with the original, I'll dismiss it completely. I also won't bother with duplicate detection reason.

On to other EXIF data: most of the time, the images are being cleaned and therefore, such EXIF data are also goner. Additionally, many scanner does not embed such data in the first place (unlike cameras).

The only valid argument left would be the ICC color profiling.
Checkmate said:
With the opinion of jkdfahlkdhfjgakdhfg considering the quality being inferior simply because the md5 does not match with the original, I'll dismiss it completely. I also won't bother with duplicate detection reason.

On to other EXIF data: most of the time, the images are being cleaned and therefore, such EXIF data are also goner. Additionally, many scanner does not embed such data in the first place (unlike cameras).

The only valid argument left would be the ICC color profiling.
Just curious, what's the pros of stripping exif? Saving space? Or in other worlds, what's the downside of *not* stripping exif?

I'm asking so because the sole reason we started to do that is because of animepaper, and it's gone already.

Non-matching md5 obviously doesn't mean it's inferior, but wouldn't it be nice to have identical file if it doesn't do harm?
Saving space is one factor, avoid shameless plug is another factor, privacy is another as well.

This topic is a suggestion about stop stripping exif altogether without a good reason. I won't buy the bs md5 mismatch.

If the only valid argument is ICC color profiling, then it may be spared but others will still get striped.
Regarding reasons for stripping EXIF, I'd name potential unexpected inconsistent representation on different softwares/platforms due to different handling of metadata. For images without EXIF, you can just check once and be sure it will show up everywhere fine barring bad displays. For images with EXIF, however, you might be experiencing this.
Checkmate said:
This topic is a suggestion about stop stripping exif altogether without a good reason. I won't buy the bs md5 mismatch.
I'll admit I started with a shitty reason but at the time that it was keeping EXIF for my shitty reason vs stripping EXIF for what seemed to be no reason at all. I apologise if this offended you in some way.

To enumerate the reasons as they are now, in favour of removing EXIF:
- Space
- Privacy
- Advertising (shameless plug)

In favour of keeping it (or a subset):
- Rendering (ICC, orientation?)
- Printing/quality (DPI)?

I'm highly dubious about the given reasons for stripping though. Are these real problems? Why weren't they brought up earlier?

I definitely don't buy space. I ran a test on a few thousand unstripped JPEGs I have and the jhead command used by Moebooru removes less than 1% of space. Do you see different savings?

Who thinks people are going to read EXIF and see their shameless plug? Isn't their name on the post a shameless plug already?

If "most of the time, the images are being cleaned and therefore, such EXIF data are also goner. Additionally, many scanner does not embed such data in the first place (unlike cameras)", which EXIF tags have privacy concerns?

Your tone and the fact that these weren't brought up before makes me feel like reasons are being made up out of anger because I offended you. If I'm wrong, I'm sorry.
jkdfahlkdhfjgakdhfg said:
Who thinks people are going to read EXIF and see their shameless plug? Isn't their name on the post a shameless plug already?
I remain neutral at all time when making decision like this. The reasons of privacy and advertising were a real problem before, that's why such measure was implemented in the first place. Shameless plug referred to people putting their sites into the EXIF. There were many before, you don't see anything now because it was all cleared.

In case you haven't noticed, we don't accept self-made uploads/arts if the person are the one who made it.

jkdfahlkdhfjgakdhfg said:
If "most of the time, the images are being cleaned and therefore, such EXIF data are also goner. Additionally, many scanner does not embed such data in the first place (unlike cameras)", which EXIF tags have privacy concerns?
I'm starting to think that you are trolling me. You know the EXIF data are added after the images are cleaned, the useful info such as scanner etc are non-existent, the useless EXIF data are added after that (shamless plug).

jkdfahlkdhfjgakdhfg said:
Your tone and the fact that these weren't brought up before makes me feel like reasons are being made up out of anger because I offended you. If I'm wrong, I'm sorry.
In regards of the printing/quality, I only bring that up after you brought that up and after I've done some fact checks: EXIF for printing/quality does not exist in literately all of my scans and the raw scans provided to me by others.

jkdfahlkdhfjgakdhfg said:
In favour of keeping it (or a subset):
- Rendering (ICC, orientation?)
lol Orientation, don't get me start on that. That aside, the only valid reason left to stop stripping the EXIF data altogether is ICC profile. That won't fly here.
Those all make sense. I'm sorry for my accusations. I was a little frustrated and my temper got a bit out of hand.

It sounds like a lot of the things you're bringing up were issues in the past and the only reason I haven't noticed them is because the work you did to fix them is working.

Sorry for being so unreasonable.
As the final word: While we won't stop the practice of stripping EXIF, I'll do something about keeping the ICC profile intact.
Changes are applied: ICC profile won't be striped.
Checkmate said:
Changes are applied: ICC profile won't be striped.
A mistake was made and ICC is still being stripped assuming you're running this commit from 3 days ago: https://github.com/moebooru/moebooru/commit/b29af548d4f16da9cbee59db2a684de934dfd1b2

As mentioned above, you'd need to use 'jhead -dc -di -dx -dt', since adding -du to the end will strip ICC.