html - Python df.to_excel() storing numbers as text in excel. How to store as Value?

ID : 274470

viewed : 47

Tags : pythonhtmlexcelpandasdataframepython





Top 5 Answer for html - Python df.to_excel() storing numbers as text in excel. How to store as Value?

vote vote

100

In addition to the other solutions where the string data is converted to numbers when creating or using the dataframe it is also possible to do it using options to the xlsxwriter engine:

# Versions of Pandas >= 1.3.0: writer = pd.ExcelWriter('output.xlsx',                         engine='xlsxwriter',                         engine_kwargs={'options': {'strings_to_numbers': True}})  # Versions of Pandas < 1.3.0: writer = pd.ExcelWriter('output.xlsx',                         engine='xlsxwriter',                         options={'strings_to_numbers': True}) 

From the docs:

strings_to_numbers: Enable the worksheet.write() method to convert strings to numbers, where possible, using float() in order to avoid an Excel warning about "Numbers Stored as Text".

vote vote

84

Consider converting numeric columns to floats since the pd.read_html reads web data as string types (i.e., objects). But before converting to floats, you need to replace hyphens to NaNs:

import pandas as pd import numpy as np  dfs = pd.read_html('https://www.google.com/finance?q=NASDAQ%3AGOOGL' +                    '&fstype=ii&ei=9YBMWIiaLo29e83Rr9AM', flavor='html5lib') xlWriter = pd.ExcelWriter('Output.xlsx', engine='xlsxwriter') workbook = xlWriter.book  for i, df in enumerate(dfs):     for col in df.columns[1:]:                  # UPDATE ONLY NUMERIC COLS          df.loc[df[col] == '-', col] = np.nan    # REPLACE HYPHEN WITH NaNs         df[col] = df[col].astype(float)         # CONVERT TO FLOAT         df.to_excel(xlWriter, sheet_name='Sheet{}'.format(i))  xlWriter.save() 
vote vote

73

That is probably because the Data Types of those columns where the warning is showing are objects and not Numeric Types, such as int or float.

In order to check the Data Types of each column of the DataFrame, use dtypes, such as

print(df.dtypes) 

In my case, the column that was stored as object instead of a numeric value, was PRECO_ES

DF dtypes

As, in my particular case, the decimal numbers are relevant, I have converted it, using astype, to float, as following

df['PRECO_ES'] = df['PRECO_ES'].astype(float) 

If we check again the Data Types, we get the following

DF column changed to float

Then, all you have to do is export the DataFrame to Excel

#Export the DataFRame (df) to XLS xlsFile = "Preco20102019.xls" df.to_excel(xlsFile)  #Export the DataFRame (df) to CSV csvFile = "Preco20102019.csv" df.to_csv(csvFile) 

If I then open the Excel file, I can see that the warning is not showing anymore, as the values are stored as numeric and not as text

Excel file without the warning

vote vote

63

Did you verify that the columns that you're exporting are actually numbers in python (int or float)?

Alternatively, you can convert the text fields into numbers in excel using the =VALUE() function.

vote vote

53

Since pandas 0.19, you can supply the argument na_values to pd.read_html which will allow pandas to correctly automatically infer the float type to your price columns...

Here's how that would look like:

dfs = pd.read_html(     'https://www.google.com/finance?q=NASDAQ%3AGOOGL&fstype=ii&ei=9YBMWIiaLo29e83Rr9AM',     flavor='html5lib',     index_col='\nIn Millions of USD (except for per share items)\n',     na_values='-' )  xlWriter = pd.ExcelWriter('Output.xlsx', engine='xlsxwriter') for i, df in enumerate(dfs):     df.to_excel(xlWriter, sheet_name='Sheet{}'.format(i)) xlWriter.save() 

Alternatively (if you don't have pandas 0.19 yet), I'd use a simpler version of @Parfait's solution:

dfs = pd.read_html(     'https://www.google.com/finance?q=NASDAQ%3AGOOGL&fstype=ii&ei=9YBMWIiaLo29e83Rr9AM',     flavor='html5lib',     index_col='\nIn Millions of USD (except for per share items)\n' )  xlWriter = pd.ExcelWriter('Output.xlsx', engine='xlsxwriter') for i, df in enumerate(dfs):     df.mask(df == '-').astype(float).to_excel(xlWriter, sheet_name='Sheet{}'.format(i)) xlWriter.save() 

This second solution only works if you correctly define your index column (in the .read_html), it will fail miserably with a ValueError if one of the (data) columns contains anything that is not convertible to a float...

Top 3 video Explaining html - Python df.to_excel() storing numbers as text in excel. How to store as Value?







Related QUESTION?