简体   繁体   中英

Not filling NaN values in dataframe

Lets say I have the following df:

      quantity#1    taxsubtotal#1    taxrate#1    quantity#2    taxsubtotal#2    taxrate#2
--  ------------  ---------------  -----------  ------------  ---------------  -----------
 0           nan             1.05           21           nan            nan            nan
 2             1             2.1            21             1              1.8            9
 6             1             0               0           nan              nan            nan
13             1             0.9             9             1              1.8            9
21             1            23.4             9             1              2.7            9

I don't want to write the NaN values to the columns of a df:

df3 = pd.DataFrame({
'InvoiceLine1':"""
    <cbc:ID>1</cbc:ID>
    <cbc:InvoicedQuantity unitCode="ZZ">"""+dftaxitems1['quantity#1'].astype(str)+"""</cbc:InvoicedQuantity>
        <cbc:TaxAmount currencyID="EUR">"""+dftaxitems1['taxsubtotal#1'].astype(str)+"""</cbc:TaxAmount>
          <cbc:Percent>"""+dftaxitems1['taxrate#1'].astype(str)+"""</cbc:Percent>""",
'InvoiceLine2':"""
    <cbc:ID>2</cbc:ID>
    <cbc:InvoicedQuantity unitCode="ZZ">"""+dftaxitems1['quantity#2'].astype(str)+"""</cbc:InvoicedQuantity>
        <cbc:TaxAmount currencyID="EUR">"""+dftaxitems1['taxsubtotal#2'].astype(str)+"""</cbc:TaxAmount>
          <cbc:Percent>"""+dftaxitems1['taxrate#2'].astype(str)+"""</cbc:Percent>""",
})

Assessing the type of nan:

type:
type(dftaxitems['quantity#2'][0])
numpy.float64

Getting the folllowing output:

    InvoiceLine1                                       InvoiceLine2
0   \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
2   \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
6   \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
13  \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
21  \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...

Desired output:

    InvoiceLine1                                       InvoiceLine2
0   \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... 
2   \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
6   \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... 
13  \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
21  \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...

df3.fillna('') did not work!

What could help according to you guys:)?

I've tried to transform all values to np.nan so that it can be accurately deleted in the new df

Please help!

Try first convert values to strings and then empty strings to missing values:

df = df.astype(str).replace('', np.nan)

and then remove .astype(str) later like dftaxitems1['quantity#1'].astype(str) .

Test:

dftaxitems1 = pd.DataFrame({'quantity#1': ['', 1.0, 1.0, 1.0, 1.0]})
dftaxitems1 = dftaxitems1.astype(str).replace('', np.nan)

s = """<cbc:InvoicedQuantity unitCode="ZZ">"""+dftaxitems1['quantity#1']+"""</cbc:InvoicedQuantity>"""
 
print (s)
0                                                  NaN
1    <cbc:InvoicedQuantity unitCode="ZZ">1.0</cbc:I...
2    <cbc:InvoicedQuantity unitCode="ZZ">1.0</cbc:I...
3    <cbc:InvoicedQuantity unitCode="ZZ">1.0</cbc:I...
4    <cbc:InvoicedQuantity unitCode="ZZ">1.0</cbc:I...
Name: quantity#1, dtype: object

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM