Functionality to support
NaN in integer series will be available in v0.24 upwards. There's information on this in the v0.24 "What's New" section, and more details under Nullable Integer Data Type.
Pandas v0.23 and earlier
In general, it's best to work with
float series where possible, even when the series is upcast from
float due to inclusion of
NaN values. This enables vectorised NumPy-based calculations where, otherwise, Python-level loops would be processed.
The docs do suggest : "One possibility is to use
dtype=object arrays instead." For example:
s = pd.Series([1, 2, 3, np.nan]) print(s.astype(object)) 0 1 1 2 2 3 3 NaN dtype: object
For cosmetic reasons, e.g. output to a file, this may be preferable.
Pandas v0.23 and earlier: background
NaN is considered a
float. The docs currently (as of v0.23) specify the reason why integer series are upcasted to
In the absence of high performance NA support being built into NumPy from the ground up, the primary casualty is the ability to represent NAs in integer arrays.
This trade-off is made largely for memory and performance reasons, and also so that the resulting Series continues to be “numeric”.
The docs also provide rules for upcasting due to
Typeclass Promotion dtype for storing NAs floating no change object no change integer cast to float64 boolean cast to object