I'm trying convert a ldr Oracle file into a pandas Dataframe, to get some insights.
I have the following string 'example' into my txt file. Where, '|' character represents the field separator and '{EOL}' the delimiter line.
countries.txt (file)
"USA"|"111"| {EOL}"ITALY"|"222"| {EOL}"MEXICO"|"333"|{EOL}
What I already tried was:
new_element=[]
new_element2=[]
csvfile = open('d:\countries.txt')
for line in csvfile
new_lines = line.split('|{EOL}')
for i in new_lines
new_element.append(i)
for j in new_element:
new_element2.append(j.replace('"','\'').replace('|',','))
del(new_element2[3])
After executing the previous commands, we got it:
new_element2
["'USA','111'", "'ITALY','222'", "'MEXICO','333'"]
As next step, we are trying convert this "list" to a DataFrame, using the following command:
pd_element = pd.DataFrame(new_element2,columns=['a','b'])
but, in this moment an error message appears to us:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
D:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in create_block_manager_from_blocks(blocks, axes)
1650 blocks = [make_block(values=blocks[0],
-> 1651 placement=slice(0, len(axes[0])))]
1652
D:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\blocks.py in make_block(values, placement, klass, ndim, dtype, fastpath)
3094
-> 3095 return klass(values, ndim=ndim, placement=placement)
3096
D:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\blocks.py in __init__(self, values, placement, ndim)
2630 super(ObjectBlock, self).__init__(values, ndim=ndim,
-> 2631 placement=placement)
2632
D:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\blocks.py in __init__(self, values, placement, ndim)
86 'Wrong number of items passed {val}, placement implies '
---> 87 '{mgr}'.format(val=len(self.values), mgr=len(self.mgr_locs)))
88
ValueError: Wrong number of items passed 1, placement implies 2
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-446-cec97633cd5d> in <module>
----> 1 pd_element = pd.DataFrame(new_outro,columns=['a','b'])
D:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
449 else:
450 mgr = init_ndarray(data, index, columns, dtype=dtype,
--> 451 copy=copy)
452 else:
453 mgr = init_dict({}, index, columns, dtype=dtype)
D:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in init_ndarray(values, index, columns, dtype, copy)
165 values = maybe_infer_to_datetimelike(values)
166
--> 167 return create_block_manager_from_blocks([values], [columns, index])
168
169
D:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in create_block_manager_from_blocks(blocks, axes)
1658 blocks = [getattr(b, 'values', b) for b in blocks]
1659 tot_items = sum(b.shape[0] for b in blocks)
-> 1660 construction_error(tot_items, blocks[0].shape[1:], axes, e)
1661
1662
D:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in construction_error(tot_items, block_shape, axes, e)
1689 raise ValueError("Empty data passed with indices specified.")
1690 raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 1691 passed, implied))
1692
1693
ValueError: Shape of passed values is (3, 1), indices imply (3, 2)
After some searches, we found this tutorial explaining, we bealive, the same problem that we have and, the solution about our problem: https://datatofish.com/list-to-dataframe/
from pandas import DataFrame
People_List = [['Jon','Smith',21],['Mark','Brown',38],['Maria','Lee',42],['Jill','Jones',28],['Jack','Ford',55]]
df = DataFrame (People_List,columns=['First_Name','Last_Name','Age'])
print (df)
It works fine!!!
We are observing that the problem is the same: We have a list and wish convert then, into a DataFrame. We understood that part of our problem happening during ldr convert file to list but, unfortunately, we cannot understand "why".
Dears, if someone can pass to us a site whit kind of manual that enable us improve our understand about what is happening, we give you thanks
I found a simple way to solve my problem Following the answer:
head = ['field1','field2']
new_element=[]
new_element2 = []
csvfile = open('d:\countries.txt')
for line in csvfile
new_lines = line.split('|{EOL}')
for j in new_element:
new_element2.append(re.sub('"','',j)).split('|')
df = DataFrame(new_element2, head)
df.to_parquet('file.parquet')
In my case, I worked with a "right" data quantity being necessary convert, fisrt, in a DataFrame and persisting as parquet file.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.