Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Microsoft IT

Microsoft Fixes the Excel Feature That Was Wrecking Scientific Data (theverge.com) 83

An anonymous reader shares a report: In 2020, scientists decided just to rework the alphanumeric symbols they used to represent genes rather than try to deal with an Excel feature that was interpreting their names as dates and (un)helpfully reformatting them automatically. Last week, a member of the Excel team posted that the company is rolling out an update on Windows and macOS to fix that. Excel's automatic conversions are intended to make it easier and faster to input certain types of commonly entered data -- numbers and dates, for instance. But for scientists using quick shorthand to make things legible, it could ruin published, peer-reviewed data, as a 2016 study found.

Microsoft detailed the update in a blog post last week, adding a checkbox labeled "Convert continuous letters and numbers to a date." You can probably guess what that toggles. The update builds on the Automatic Data Conversions settings the company added last year, which included the option for Excel to warn you when it's about to get extra helpful and let you load your file without automatic conversion so you can ensure nothing will be screwed up by it.

This discussion has been archived. No new comments can be posted.

Microsoft Fixes the Excel Feature That Was Wrecking Scientific Data

Comments Filter:
  • At this point (Score:3, Insightful)

    by grasshoppa ( 657393 ) on Monday October 23, 2023 @12:08PM (#63945825) Homepage

    At this point, anyone using excel for anything db related and critical deserves whatever they get.

    It's been DECADES dealing with that clusterfuck nightmare. I have no sympathy.

    • Re: (Score:1, Offtopic)

      Agreed.

      Stop using shitty tools -- a spreadsheet is not a database no matter how much you abuse it.

      Does anyone know if OpenOffice / LibreOffice is susceptible to this bug?

      • The article doesn’t say anything about Excel being used for a database. Rather auto correct is interpreting data incorrectly.

        • by Rei ( 128717 ) on Monday October 23, 2023 @12:53PM (#63946029) Homepage

          Q: What do Excel and incels have in common?

          A: Both have trouble understanding when something isn't a date.

        • If you follow the links back to the originating stories, it's clear the scientists were using Excel as a database.

          A story that has been repeated innumerable times.
          • But that was not the problem. The problem was an overzealous auto correct changing something to a date that wasn't. The exact same problem would have happened in a word processor.
            • by jbengt ( 874751 )
              Yes, and you can customize/turn off a lot of autocorrect settings, but you can't turn off the automatic make-it-a-date incorrections.
              If aware, you can add ' while typing to make it text (though you might have intended a number or something else), but that doesn't help when copying and pasting from a another file like .csv, .pdf, or .docx. You used to get a dialog when cutting and pasting that allowed you to format columns before the data was entered, but I haven't seen that lately.
              The weirdest auto"corre
            • It' not autocorrect, it's the cell contents parser. I unfortunately run into this a fair bit when I receive database extracts for analysis. Frequently the randomly-generated unique identifiers used as primary keys are misinterpreted as dates. Something like 3mAy57tRZ gets mangled into "May 3rd, 1957"; example only, from memory, actual way Microsoft fucks things up may vary.

              This an be prevented by explicitly setting a column as Text, but if anyone else failed to do so before the file got you, you're screwed.

          • by deKernel ( 65640 )

            No, no it is not clear at all. Neither of the Verge articles ever mentions the word database. In addition, they never indicate just how much data was being stored in the spreadsheets/CVS files.

            • They don't use the word "database" but when you read what they are doing it's clear they are using the spreadsheet as a database.
        • My attitude is that whether you are using Excel as a database or to do maths on many numbers, Excel is not the right tool. I don't think it was ever meant to be used as a database, those functions were added because people were using it as a database. But that doesn't make it a good choice for database work. As for number-crunching it is not the right tool for another reason: you are mixing the data to be crunched with the instructions on how to crunch the numbers. I don't think that is a good solution.
          • by TWX ( 665546 )

            It may not be the right tool, but it's been adapted to be the de-facto tool all over the place, including in scientific, engineering, and database, because it can essentially display two-dimensional data in a very easy to manipulate, visual way, and can turn around and generate graphs based on processing the data, producing the pretty pictures that laypersons like seeing when they are in a position to affect the budget for a given program or simply want to see the results.

            It's also probably the most popular

      • Does anyone know if OpenOffice / LibreOffice is susceptible to this bug?

        Can't speak to this one in particular, but if it's anything like LibreOffice's other mirroring of Excel's "helpful" conversion features, it probably is.

        The fastest fix seems to be "Save As CSV" then "Fix it in Notepad++"

      • by dfghjk ( 711126 )

        A spreadsheet is literally a database. No matter how much you abuse the language.

      • Re:At this point (Score:5, Insightful)

        by Shakrai ( 717556 ) on Monday October 23, 2023 @04:59PM (#63947041) Journal

        Stop using shitty tools -- a spreadsheet is not a database no matter how much you abuse it.

        You do know that Google Sheets has a QUERY function [google.com] where you can perform SQL like searches, right?

        You can't replace a full relational database with Excel/Sheets but you absolutely can and millions of people do work with data in a spreadsheet as you would in a relational database. It's not the right choice for an application, it's probably not the right choice for collaboration beyond a very small team, but if you just need an easy way to collect, sort, and work with some data, oftentimes a spreadsheet is a perfectly acceptable solution. It's trivial to move from a spreadsheet into a real database when/if you outgrow it too.

        Honestly, there's like 20 comments below yours dumping on Excel. Do any of you actually work in the real world? You expect non-programmers to sit down and learn enough SQL to replace Excel's role in collecting and organizing data sets? Why would they do that? I'm in the tech business and I have no desire to do that. That's a job for a software developer/engineer. I just need a way to sort some data and run simplistic searches against it. I can do that with Excel. I can do it with Google Sheets. I work with dozens of systems that will output CSVs. I can import those into Sheets/Excel and work with the data or I can write a whole application to do the same with MySQL/MSSQL/Oracle. The vast majority of the time the former gets the job done in a fraction of the time....

        • by AmiMoJo ( 196126 )

          Relational databases usually aren't the right tool for working with large amounts of data either. There are better types of DB that are suited to particular types of data, like time-series.

      • Has Open/Libre become usable in recent years? Last time I tried it pretty much saturated a system that could run Excel (or Google Sheets) fine.

    • by tlhIngan ( 30335 )

      It's not just being misused as a database, it's being misused as a data analysis utility as well.

    • I'm not a microsoft apologist, quite the opposite, but Excel and it's imitators (google sheets) are fiendishly useful tools. Excel in particular can be used to do a number of things that you don't expect from a spreadsheet (live feed dashboards, remember the flight simulator built into excel?). With the addition of python support, I suspect we'll see it being used even more by the high-end computing/scientific communities. While their may be better tools to do a thing, if you know excel and can use it,
    • At this point, anyone using excel for anything db related and critical deserves whatever they get.

      It's been DECADES dealing with that clusterfuck nightmare. I have no sympathy.

      As I said in another comment, it isn't that they're choosing to use excel, it's that the data is circulated in csv format, double clicking on a csv file in Windows will open in excel by default, modify one or two values, save, close, and now you have a silently corrupted csv file.

      • Thanks for the explanation. I was wondering what's so hard about formatting a cell to be the correct data type? But I have encountered a similar problem in the past with libreoffice, when importing csv. Something like 123E4 will be turned into a number by default; now I make sure that every column is "text" unless it's really not text. I guess for those who don't know to look for this behavior, it's very easy to "corrupt" a csv file.

    • A fix was never needed, although it's nice that we have an option to turn these conversions off now. But does that stick with the spreadsheet or just the installation of Excel and when you send it to your colleagues they'll be dealing with the auto-conversion until they uncheck the option? Based on the screenshot in the article it appears to just be for that instance of Excel and not the spreadsheet.

      They could have and should have just formatted the cells as 'text'. It would not have tried to interpret i

      • If they did it right, the option not to autocorrect to dates would have been stored with the spreadsheet, so that whoever worked with it next wouldn't have to worry about it. Of course, this is Microsoft we're talking about, so the odds are that the option is stored in the program's config file, or, more likely in the Registry, meaning that everybody has to remember to set it themselves.
        • Once stored as a spreadsheet (xlsx) this doesn't happen. The problem is when the data imported - it is under-specified (there is nothing in a .csv that says what datatype a field is) so heuristics are used.

          You could try to implement a hard rule against data loss, but that would require remembering the original string for every number in a huge spreadsheet, e.g. storing 3.14159 and "3.14159" just in case that wasn't intended as a number.

          • by jbengt ( 874751 )
            A .csv file is a text file, every field in a .csv file is a string. Some strings can be interpreted as numbers, dates, etc. without much ambiguity. Others not so much. Excel autocorrects some formats to dates without a way to turn that behaviour off. Even strings like "2-1/2" get interpreted as dates, when that's obviously not intended as a date.
            • by Shakrai ( 717556 )

              If you're importing from a CSV it's pretty trivial [microsoft.com] to define how you want the columns to be treated. Slight bit of extra work to get the text import dialog rather than allowing Excel to make it's best (occasionally wrong) guess as to format, but not onerous, few extra clicks in your workflow.

    • 4years? The problem has literally existed since Excels inception.

      The number of times I've had customers complain about data being corrupted and it turned out that Excel converted product sku into either dates, or scientific notation and silently managed shit is innumerable.

  • Scientists should not be relying on Excel for calculations or even data recording. Excel, in typical MS fashion, does all sorts of stuff behind your back to make your life "easier," while being an opaque black box that it is impossible to troubleshoot. Even the handling of significant digits is scientifically wrong.

    • by guruevi ( 827432 )

      Excel still emulates the Pentium bug just because it has to be backwards compatible. Good luck with that science paper.

  • Why would you use excel for such data in a first place? There are so many other methods doing that better. Even CSV seems to me like a better option, not to mention something like HDF5.
    • HDF5 looks amazing. I was unfamiliar. Thank you.

      I think the short answer to "why?" is educational hegemony.

      MS Office is often available on academic computers, and when a biology undergrad, it was what I and the rest of my cohort were expected to use. Granted, that was a moment ago, but I can’t imagine the prevalence of Excel in education having changed all that much. What would schools replace it with? LibreOffice? No, I can’t imagine that, either.

      I get that CSV is an option in Excel, but it is

      • by guruevi ( 827432 )

        But even so, there are so many better things than Excel that you should be learning, even as an undergrad. Things like GraphPad, R-Studio, SPSS, STATA, ELN...

        Enough "spreadsheet-like" programs to do anything from very simple to very complicated.

    • Re:Excel? (Score:5, Informative)

      by eastlight_jim ( 1070084 ) on Monday October 23, 2023 @01:52PM (#63946359)

      Why would you use excel for such data in a first place? There are so many other methods doing that better. Even CSV seems to me like a better option, not to mention something like HDF5.

      It isn't that they're choosing to use excel, it's that the data is circulated in csv format, double clicking on a csv file in Windows will open in excel by default, modify one or two values, save, close, and now you have a silently corrupted csv file.

    • by ceoyoyo ( 59147 )

      It probably was CSV. Anyway, a spreadsheet is a convenient way of entering tabular data. You *should* be able to tell your spreadsheet program not to fuck with your data entry. Random labs don't have money to pay somebody to make them a nice data entry GUI with HDF5 backing, nor keep them on retainer to modify it for every random grad student project that needs a tweak.

  • So now it has a different conversion issue.

  • by MpVpRb ( 1423381 ) on Monday October 23, 2023 @12:25PM (#63945885)

    ..seem to do exactly the wrong thing every time
    I want to control everything explicitly

    • With numbers and dates I agree completely. It always seems to want to convert things to a date when I don't want it. Sometimes I want to just type February 23 and not have it convert to 2/23/23. Also, it doesn't seem to 'see' converted dates the same as 'normally input' dates, so when you sort it you end up with two sets of dates from start to finish even after you select them all and format it the 'same' way. You literally have to use excel text-to-columns or other feature to get it to format all the dat
    • by eepok ( 545733 )

      In modern computing "smart" isn't intended to mean "correct", but instead "thinking for the user".

      Sometimes it's good, such as assuming that typing "11/25" is the date "November 25" of the current year and automatically setting the cell format appropriately and attributing the correct Excel decimal date value ("45255"). Other times it's bad, such as assuming that typing "11/25" is a date when it's actually a fraction with the decimal value value of "0.44".

      One of the most annoying instances of "thinking for

      • Yes! When I'm searching for images from Warner Brothers movies, I want images from Warner Brothers, not MGM, Fox or Universal. When I specify toon, I don't want images of live people, but in both cases, that's what I get.
      • by Shakrai ( 717556 )

        Other times it's bad, such as assuming that typing "11/25" is a date when it's actually a fraction with the decimal value value of "0.44".

        Not to nitpick your example but if you desire the fraction outcome the solution to your problem is to enter it as =11/25 so the program knows not to treat it as a parsable text field that's certainly going to be interpreted as a date. If you lead with = in Excel you get a mathematical operation, the default output is decimal, but you can also format the cell to show a fraction, or a percentage (44%) for that matter.

        This is how every spreadsheet I've worked with in the last three decades has functioned. T

        • by eepok ( 545733 )

          I don't consider that a nitpick at all. It's simply a fact that you can avoid a lot of the "smart solution" issues by avoiding their triggers in the first place (like preformatting columns). In fact, it's the lack of training around good procedure that instigates the creation of this "smart solutions". Another way to put it is that we would rather program a computer to guess what humans want than teach people how to do the work well.

          Enter the rush for AI...

          • by Shakrai ( 717556 )

            The "smart solution" is the "smart solution" because most of the time it's correct. Most of the time you absentmindedly type 11/25 into Excel you really meant November 25th.

            I'm not sure AI is going to fix this one because I'm not convinced "AI" is all that intelligent or represents anything more than re-branding of the standard if-then-else framework that computers have operated under forever. It all smells like snake oil to me.

    • by Pascoea ( 968200 )
      Ugh. You're not kidding. I do a lot of data migration work. Most of the time it involves handing data over to an end user, then incorporating it into the Translation process once it's been cleaned up. Excel is really the only option here. It's super awesome for things like zip codes (it LOVES dropping leading zeros), part numbers that look vaguely like a date are now a date, anything that's a number longer than 10-ish digits (Tax Exempt numbers are a great example) getting represented scientifically, deal
      • Not all strings of digits are numbers, and shouldn't be treated as such. So store them as text, you're never going to add or multiply zip-codes.

        • by Pascoea ( 968200 )

          So store them as text

          Explain to the end user who's doing the data cleansing what that means, and how to do it. Good luck.

          Have you ever read the sentence "There was a quote mark in front of the zip codes. Zip codes shouldn't have those there so I removed them." in an e-mail talking about a data migration initiative? Users are dumb.

          • Users are dumb.

            Not always. A data entry drone probably wasn't ever told why there's a quotation mark in front of every zip code, so it's not unreasonable for him to think it's a mistake and remove it. Yes, you'd think that seeing that every zip code is that way should have told him that it's supposed to be there, but people like him aren't paid to think, just to do what they're told. You don't need to tell him why it's there, just that it's supposed to be there and hope he remembers.
            • by Pascoea ( 968200 )

              hope he remembers

              That's the point of the argument, though. The argument is that they shouldn't have to remember. They shouldn't have to remember to tell the person they assigned to the task to. The ask is pretty simple: Excel needs a workbook-wide "quit trying to be helpful" switch. Basically format everything as "text" in a way that the end-user isn't able to change it.

              Not always.

              Of course not. I have a lot of great users I work with. Even the ones that do dumb things aren't actually dumb. They don't know they shouldn't do somethin

              • I agree with you that Excel shouldn't be converting everything with numbers into dates, or make other inappropriate conversions without warning you, or giving you an easy way to reverse the conversions. However, the fact that it shouldn't is no reason to refuse to use workarounds on the grounds that "I shouldn't have to."
                • by Pascoea ( 968200 )
                  I'm not refusing to use anything. I'm working with the tools I have available, maybe not using it 100% as intended, and whining because it intentionally makes my life difficult. It's like a lot of modern technology: They are trying to dumb it down to the lowest common denominator at the cost of getting in the way of more advanced users.
              • by Shakrai ( 717556 )

                Excel needs a workbook-wide "quit trying to be helpful" switch.

                If you already have a workbook why do you need that switch? Presumably you set the column/cell formatting to the appropriate values?

                Conversely, if you're using CSVs (my assumption when reading your thread) how would that option help you? CSV is going to retain any workbook specific settings so now you're back to explaining things to your end user.

            • Excel provides ways of locking the formatting while allowing editing of the contents (using Protect Workbook), so you won't see or need to use quotes. If your "data entry drone" is still somehow futzing with the formatting you've got bigger issues.

              • Hey, that wasn't my example to start with, I was just responding to a comment. Besides, I don't use either Excel or Windows; I've been using Linux for almost 20 years now, along with LibreOffice.
    • ..seem to do exactly the wrong thing every time I want to control everything explicitly

      OMG! Microsoft, please stop automatically selecting bullets, and tabs next to the thing I selected. If I wanted them, I would have selected them!

    • I have noticed that too. It seems their thinking is: They should want to do this

      Rather than: Let's see how it is really used and make the behavior conform.

      (part of your message is missing. i found it in the subject, but then i had no idea what your message was going to be about. *shrug* subverting expectation is fun. Microsoft does it, why shouldn't you?)

  • MS went through great pains to get every PC whereever I studied running some version of Windows by donations (which really meant everyone trained there was trained to use Windows and helping to proliferate software running on it as a de facto industry standard).

    That included Excel. If hundreds of science students and post grads from my alma mater alone used Excel then how the hell did this not get recognised as needing a patch a decade earlier?

  • UPCs too (Score:5, Interesting)

    by pi_rules ( 123171 ) on Monday October 23, 2023 @12:36PM (#63945941)

    UPCs are a nightmare there too. Plenty of vendors don't know how to put a 12 digit number into a cell w/out it being formatted like a number, so they put dashes or other punctuation in them to prevent the formatting from happening.

    Real nice when you're importing data from them.

    • Any cell value that starts with a single quote (') will not be auto-formatted. Not useful for imported data but handy for data entry.
      • Even with imported data there are plenty of easy ways to reformat it either in-line or as a separate step. sed and awk can be great pre and post processing tools for things like this; you can even build in some data validation to hash valid records and reject invalid records.

  • Microsoftâ(TM)s attitude of making Office outwit the user isnâ(TM)t new. I remember when adding a âfile:///â URL that it would automatically convert to âc:\â(TM). No idea if it is fixed, since I rarely use Excel these days.

    Mind you, phones are just as bad with autocorrect these days.

  • I can't even get it to stop converting strings of numbers in 2.83834909E9 format. I change it to text cell formatting and it just says "yeah but that's a number now though" and changes it back. AND it only does this on the web-based version. Local Excel works fine.

    If you're not using LibreOffice for scientific stuff, you're wasting your grant money. Then again I can't stop LibreOffice from importing ghost single quote characters in money-related CSVs without this ridiculous roundabout hack.
  • I love Excel, it's Microsoft's best software every by far (and yes, I've tried the alternatives, they are not up to par), but that auto-formatting specifically for detecting dates is ridiculous. It is so permissive, it "works" with slashes, hyphens etc, even if you only have two numbers with a hyphen (like how I'd write a range as title on a column) it will figure out a way it can possibly be interpreted as a date (if your number is 13 it could be a month, otherwise it could be a year, no worries any 2 digi

    • by jbengt ( 874751 )

      I know you can turn it off, but you shouldn't have to.

      If you know it can be turned off, please tell how. Because up 'til now I've been unable to find a way to, and googling gets me posts saying that it cannot be turned off. (Perhaps this update will allow it to be turned off, not just for .csv imports)

    • Nah, Lotus 1-2-3 was much better.

      That aside, for headings and things just start with a single quote and it is treated as text. There are a lot of easy enough ways to work within the way Excel does things, but it takes an awareness of what is happening. Some data I needed to break every segment into a separate column and re-assemble dates where needed, or add industry standard formatting if that was the issue.

      The bigger issue is that Excel quickly becomes the wrong tool for the job.

  • by Lavandera ( 7308312 ) on Monday October 23, 2023 @01:18PM (#63946159)

    I had similar problem with Excel in different field.

    I switched to Libreoffice and problem was solved...

  • We have some processes where our faculty or staff want particular data given to them in a handful of Excel spreadsheets. Invariably I have to manually declare several individual columns to be text in order to work around these sorts of "helpful" Excel features.

    This is another situation where I really wish "you have to opt in" was the standard - I'd just as soon have all that turned off by default.

  • ...on how many users have workflows this will break.

    • by TWX ( 665546 )

      It's a Microsoft product. Every few years something breaks workflow regardless.

    • by Entrope ( 68843 )

      Unlike a lot of Microsoft breakages, in this case, you have to opt in to Microsoft not silently reformatting your data.

  • If only there were ways to force Excel to not format certain data as dates when you set the spreadsheet up or import the data...Oh wait, there are. And it's not even complicated, nor does it take too many clicks.

    People who complain about Excel are either using the web-version only (ew, stop that) or are just too stupid to be using any software, really.

  • Scientists have been lazy on that front for decades. Even with Excel trashing their data, they kept using it. it is really amazing, and I wonder why Microsoft feels the need to fix the issue now.
  • Actually, there are a lot of functions in Excel, even more than you can imagine. But to fully utilize it, you need to go through a whole course on it. It's not something you pick up overnight. Most people just use the most basic and fundamental functions during work or studies. I didn't think about this much before, but then I received assignment from https://edubirdie.com/research-papers-writing-services [edubirdie.com] for my college. And I noticed that this writing service was using many formulas I didn't know about bef

Dynamically binding, you realize the magic. Statically binding, you see only the hierarchy.

Working...