[英]looping through multiple values in a column
有人可以告訴我如何循環數據框列中的多個值。
例:
col1 col2
High street qwe.723,qwe.2,qwe.17,qwe.1000,qwe.23
Must street qwe.34,qwe.17,qwe.1000,qwe.23
我想要以下輸出:
High street
qwe.723
High street
qwe.2
High street
qwe.17
High street
qwe.1000
High street
qwe.23
Must street
qwe.34
Must street
qwe.17
Must street
qwe.1000
Must street
qwe.23
我的嘗試:
lines = open('file.txt','r')
for line in lines:
line=line.strip().split('\t')
vals=line[1].split(',')
for val in vals:
print(line[0],'\n',val)
嘗試這個:
In [136]: df
Out[136]:
col1 col2
0 High street qwe.723,qwe.2,qwe.17,qwe.1000,qwe.23
1 Must street qwe.34,qwe.17,qwe.1000,qwe.23
In [137]: df.set_index('col1').col2.str.split(',', expand=True).stack().reset_index(level=1, drop=1).to_frame('col2').reset_index().stack()
...:
Out[137]:
0 col1 High street
col2 qwe.723
1 col1 High street
col2 qwe.2
2 col1 High street
col2 qwe.17
3 col1 High street
col2 qwe.1000
4 col1 High street
col2 qwe.23
5 col1 Must street
col2 qwe.34
6 col1 Must street
col2 qwe.17
7 col1 Must street
col2 qwe.1000
8 col1 Must street
col2 qwe.23
dtype: object
我敢肯定必須有更好的方法來做到這一點......
另一個:
(df.set_index('col1')
.col2.str.split(',', expand=True)
.stack()
.reset_index(level=-1, drop=True)
.to_csv('output.txt',sep='\n')
因為我正在玩cytoolz
和numpy
超級快!
import cytoolz
c2 = np.core.defchararray.split(df.col2.values.astype('str'), ',')
col1 = df.col1.values.repeat([len(c) for c in c2.tolist()])
col2 = list(cytoolz.concat(c2))
np.stack([col1, col2]).ravel('F')
array(['High street', 'qwe.723', 'High street', 'qwe.2', 'High street',
'qwe.17', 'High street', 'qwe.1000', 'High street', 'qwe.23',
'Must street', 'qwe.34', 'Must street', 'qwe.17', 'Must street',
'qwe.1000', 'Must street', 'qwe.23'], dtype=object)
時間測試
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.