Instead of rewriting files, you can test them for equality first. So If they are equal, you can just do nothing. Reading in a file is a lot faster than writing out a file. Therefore In this use case, this FileEquals method can significantly improve performance.
We looked at an implementation of a file content comparison method. You can guess whether two files are equal by testing their dates and lengths. But you cannot know if every single byte is equal unless you test every single byte. And Hash computations can give virtually unique file identifiers. But this is not advantageous here. I'm not an expert on streams, but I'm well aware that I could easily shoot myself in the foot here as far as memory usage is concerned. I sped up the "memcmp" by using a Int64 compare in a loop over the read stream chunks.
The accepted answer had an error that was pointed out, but never corrected: stream read calls are not guaranteed to return all bytes requested. BinaryReader ReadBytes calls are guaranteed to return as many bytes as requested unless the end of the stream is reached first. But if you have to compare the entire file contents, one neat trick I've seen is reading the bytes in strides equal to the bitness of the CPU.
For example, on a 32 bit PC, read 4 bytes at a time and compare them as int32's. This I have found works well comparing first the length without reading data and then comparing the read byte sequence. Yet another answer, derived from chsh. MD5 with usings and shortcuts for file same, file not exists and differing lengths:.
Stack Overflow for Teams — Collaborate and share knowledge with a private group. Create a free Team What is Teams? Collectives on Stack Overflow. Learn more. How to compare 2 files fast using. Ask Question. Asked 12 years, 4 months ago.
Active 3 months ago. Viewed k times. Would a checksum comparison such as CRC be faster? Are there any. NET libraries that can generate a checksum for a file? John Saunders k 25 25 gold badges silver badges bronze badges. Robin Rodricks Robin Rodricks k gold badges silver badges bronze badges. Add a comment. Active Oldest Votes.
Length return false; if string. Equals first. FullName, second. FullName, StringComparison. Ceiling double first. ToInt64 one,0! ComputeHash first. ComputeHash second. Andrew Arnott You made my life easier.
Thank you — anindis. Glad it helped so many years on though! Note that FileStream. Read may actually read less bytes than the requested number. You should use StreamReader.
ReadBlock instead. In the Int64 version when the stream length is not a multiple of Int64 then the last iteration is comparing the unfilled bytes using previous iteration's fill which should also be equal so it's fine. Also if the stream length is less than sizeof Int64 then the unfilled bytes are 0 since C initializes arrays.
IMO, the code should probably comment these oddities. Show 3 more comments. A checksum comparison will most likely be slower than a byte-by-byte comparison. Community Bot 1 1 1 silver badge.
Reed Copsey Reed Copsey k 72 72 gold badges silver badges bronze badges. Make sure to take into account where your files are located. If you're comparing local files to a back-up half-way across the world or over a network with horrible bandwidth you may be better off to hash first and send a checksum over the network instead of sending a stream of bytes to compare. I thought to use precomputed hash, but do you think I can reasonably assume that if 2 e.
MD5 hash are equal, the 2 files are equal and avoid further bytebyte comparison? ReadAllBytes path1. SequenceEqual File.
ReadAllBytes fi1. ReadAllBytes fi2. FullName ; Unlike some other posted answers, this is conclusively correct for any kind of file: binary, text, media, executable, etc. Glenn Slayden Glenn Slayden That is why i would rather go for a streamreader with a buffer. In addition to Reed Copsey 's answer: The worst case is where the two files are identical. To be complete: the other big gain is stopping as soon as the bytes at 1 position are different.
Henk: I thought this was too obvious :- — dtb. Good point on adding this. It was obvious to me, so I didn't include it, but it's good to mention. Read buffer2, 0, bufferSize ; if count1! Lars Lars 8 8 silver badges 19 19 bronze badges. In general the check count1! Read can return less than the count you have provided, for various reasons.
Edit: This method would not work for comparing binary files! NET 4. ReadLines path1. ReadLines path2 ;. Sam Harwell Sam Harwell Wouldn't you also need to store both files in memory? Note that File also has the function ReadAllBytes which can use SequenceEquals as well so use that instead as it would work on all files. And as RandomInsano said, this is stored in memory so while it's perferctly fine to use for small files I would be careful using it with large files.
DaedalusAlpha It returns an enumerable, so the lines will be loaded on-demand and not stored in memory the whole time.
ReadAllBytes, on the other hand, does return the whole file as an array. Glenn Slayden Guffa Guffa k gold badges silver badges bronze badges. Use a larger hash and you can get the odds of a false positive to well below the odds the computer erred while doing the test. I disagree about the hash time vs seek time.
0コメント