I have a .csv file and it looks like
1, 1 2 3 4 5
3, 2 3 4 5 6
2, 5 6 5 4 8
5, 5 4 8 6 2
...
how can I do to get the first column
a = [1 3 2 5 ...]
and the matrix
b = [ 1 2 3 4 5
2 3 4 5 6
5 6 5 4 8
5 4 8 6 2 ]
with type integer numpy array and I have tried
data = np.asarray(pd.read_csv('Data.csv'))
but it make the thing worse...
I think you need,
df=pd.read_csv()
first_col=np.array(df.iloc[:0])
df_array=np.array(df.iloc[:,1:])
pandas
supports multiple delimiters via regex, pd.read_csv
, engine='python'
. You can try something like this:
df = pd.read_csv('Data.csv', header=None, sep=' |, ',
engine='python', dtype=int)
Then retrieve your data as follows:
a = df.iloc[:, 0].values
b = df.iloc[:, 1:].values
A pure Numpy approach would be using np.loadtext()
and converting the strings to a proper type by passing in converter function:
In [70]: col1, col2 = np.loadtxt('test.csv', converters={0:int, 1:bytes.decode}, dtype=str, delimiter=',', unpack=True)
In [71]: col1 = col1.astype(int)
In [72]: col2 = np.vstack(np.core.defchararray.split(col2)).astype(int)
Result:
In [73]: col1
Out[73]: array([1, 3, 2, 5])
In [74]: col2
Out[74]:
array([[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[5, 6, 5, 4, 8],
[5, 4, 8, 6, 2]])
Note that before converting col2
to and array of integers it's an array of strings like following:
In [76]: col2
Out[76]:
array([' 1 2 3 4 5', ' 2 3 4 5 6', ' 5 6 5 4 8', ' 5 4 8 6 2'],
dtype='<U10')
If you also want them separated but in string type at the next step you just don't need to use vstack()
and astype()
. In that case you'll get:
In [77]: np.core.defchararray.split(col2)
Out[77]:
array([['1', '2', '3', '4', '5'], ['2', '3', '4', '5', '6'],
['5', '6', '5', '4', '8'], ['5', '4', '8', '6', '2']], dtype=object)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.