简体   繁体   中英

Error convert Oracle ldr to pandas DataFrame

I'm trying convert a ldr Oracle file into a pandas Dataframe, to get some insights.

I have the following string 'example' into my txt file. Where, '|' character represents the field separator and '{EOL}' the delimiter line.

countries.txt (file)

"USA"|"111"| {EOL}"ITALY"|"222"| {EOL}"MEXICO"|"333"|{EOL}

What I already tried was:

new_element=[]
new_element2=[]
csvfile = open('d:\countries.txt')
for line in csvfile 
    new_lines = line.split('|{EOL}')
for i in new_lines 
    new_element.append(i)
for j in new_element:
    new_element2.append(j.replace('"','\'').replace('|',','))
del(new_element2[3]) 

After executing the previous commands, we got it:

new_element2
["'USA','111'", "'ITALY','222'", "'MEXICO','333'"]

As next step, we are trying convert this "list" to a DataFrame, using the following command:

pd_element = pd.DataFrame(new_element2,columns=['a','b'])

but, in this moment an error message appears to us:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
D:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in create_block_manager_from_blocks(blocks, axes)
   1650                 blocks = [make_block(values=blocks[0],
-> 1651                                      placement=slice(0, len(axes[0])))]
   1652 

D:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\blocks.py in make_block(values, placement, klass, ndim, dtype, fastpath)
   3094 
-> 3095     return klass(values, ndim=ndim, placement=placement)
   3096 

D:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\blocks.py in __init__(self, values, placement, ndim)
   2630         super(ObjectBlock, self).__init__(values, ndim=ndim,
-> 2631                                           placement=placement)
   2632 

D:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\blocks.py in __init__(self, values, placement, ndim)
     86                 'Wrong number of items passed {val}, placement implies '
---> 87                 '{mgr}'.format(val=len(self.values), mgr=len(self.mgr_locs)))
     88 

ValueError: Wrong number of items passed 1, placement implies 2

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-446-cec97633cd5d> in <module>
----> 1 pd_element = pd.DataFrame(new_outro,columns=['a','b'])

D:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
    449                 else:
    450                     mgr = init_ndarray(data, index, columns, dtype=dtype,
--> 451                                        copy=copy)
    452             else:
    453                 mgr = init_dict({}, index, columns, dtype=dtype)

D:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in init_ndarray(values, index, columns, dtype, copy)
    165         values = maybe_infer_to_datetimelike(values)
    166 
--> 167     return create_block_manager_from_blocks([values], [columns, index])
    168 
    169 

D:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in create_block_manager_from_blocks(blocks, axes)
   1658         blocks = [getattr(b, 'values', b) for b in blocks]
   1659         tot_items = sum(b.shape[0] for b in blocks)
-> 1660         construction_error(tot_items, blocks[0].shape[1:], axes, e)
   1661 
   1662 

D:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in construction_error(tot_items, block_shape, axes, e)
   1689         raise ValueError("Empty data passed with indices specified.")
   1690     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 1691         passed, implied))
   1692 
   1693 

ValueError: Shape of passed values is (3, 1), indices imply (3, 2)

After some searches, we found this tutorial explaining, we bealive, the same problem that we have and, the solution about our problem: https://datatofish.com/list-to-dataframe/

from pandas import DataFrame

    People_List = [['Jon','Smith',21],['Mark','Brown',38],['Maria','Lee',42],['Jill','Jones',28],['Jack','Ford',55]]
    
    df = DataFrame (People_List,columns=['First_Name','Last_Name','Age'])
    print (df)

It works fine!!!

We are observing that the problem is the same: We have a list and wish convert then, into a DataFrame. We understood that part of our problem happening during ldr convert file to list but, unfortunately, we cannot understand "why".

Dears, if someone can pass to us a site whit kind of manual that enable us improve our understand about what is happening, we give you thanks

I found a simple way to solve my problem Following the answer:

head = ['field1','field2']
new_element=[]
new_element2 = []
csvfile = open('d:\countries.txt')
for line in csvfile 
    new_lines = line.split('|{EOL}')
    for j in new_element:
        new_element2.append(re.sub('"','',j)).split('|')
df = DataFrame(new_element2, head)
df.to_parquet('file.parquet')

In my case, I worked with a "right" data quantity being necessary convert, fisrt, in a DataFrame and persisting as parquet file.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM