I apologize in advance if this question seems slightly naive. I am still learning about the interplay between pandas and numpy.
I have a pandas DataFrame that I am trying to convert into an array for analysis using scikit-learn. I have tried df.values and df.to_records() to convert it, but for some reason, it changes the shape during the conversion.
This is the first few lines of DataFrame ( df
) in Pandas.
Index Code1 Code2 Code3
0 99285 5921 5921
1 99284 NaN 5921
2 99284 NaN 4660
3 99285 42789 42789
4 99284 92321 92321
5 99283 NaN 92321
...
[94 rows x 3 columns]
However, if I call df.values
, I get the following result, which, as far as I understand, is not an array as arrays are lists of tuples.
[['99285' '5921' '5921']
['99284' nan '5921']
['99284' nan '4660']
['99285' '42789' '42789']
['99284' '92321' '92321']
['99283' nan '92321']
...
If I call df.to_records()
, I get the following result, which is an array, but not of the right shape as shown below.
[(0, '99285', '5921', '5921') (1, '99284', nan, '5921')
(2, '99284', nan, '4660') (3, '99285', '42789', '42789')
(4, '99284', '92321', '92321') (5, '99283', nan, '92321')
...
>>>df.to_records().shape
(94,)
Can someone help me understand what I need to do to get an array with a shape of (94,3)
?
Important notes: The columns are all strings (and need to stay as strings), not ints, if that helps.
In fact, df.values
does return a numpy.ndarray
. However, due to the way it prints, it looks like a lists of lists. Check by doing type(df.values)
or by looking at its shape df.values.shape == (93, 4)
.
However, df.to_records()
does not return a numpy.ndarray
, but a numpy.core.records.recarray
. You can see that it is a recarray by doing
type(df.to_records())
or by noticing that the dtype is odd-looking:
df.to_records().dtype
The shape of df.to_records()
just indicates how many records there are, in your case 94. Record arrays behave differently than normal numpy arrays. For example, try
df.to_records()['Code1']
df.to_records().code1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.