[英]Iterate over pandas dataframe rows as pure text
I would like to read a dataframe (which contains tweets) row by row in order to analyze the text. 我想逐行读取一个数据帧(包含推文),以分析文本。
import csv
import pandas as pd
df = pd.read_csv('tweets2.csv')
df.head()
for row in df.iterrows():
print (row)
This code I wrote does not do the job, since the "row" also includes the index. 我写的这段代码没有用,因为“行”还包括索引。 Instead, I want the simple text, that I will process further. 相反,我需要简单的文本,我将对其进行进一步处理。
You could use df.values
: 您可以使用df.values
:
for row in df.values:
print(row)
Example: 例:
df = pd.DataFrame({'Col1': [1, 2, 3, 4, 5], 'Col2' : ['a', 'b', 'c', 'd', 'e']})
print(df)
Col1 Col2
0 1 a
1 2 b
2 3 c
3 4 d
4 5 e
for row in df.values:
print(row)
[1 'a']
[2 'b']
[3 'c']
[4 'd']
[5 'e']
iterrows
yields (index, Series)
pairs iterrows
产量(index, Series)
对
So you could unpack them in the for loop: 因此,您可以在for循环中解压缩它们:
for i, row in df.iterrows():
print(row)
If you don't use the i
, you should change it to _
. 如果不使用i
,则应将其更改为_
。
Using iterrows
, each row
is a Series
. 使用iterrows
,每row
是一个Series
。 As shown by @cᴏʟᴅsᴘᴇᴇᴅ, an alternative is using values
: 如@cᴏʟᴅsᴘᴇᴇᴅ所示,另一种方法是使用values
:
for row in df.values:
print(row)
With this method each row
is a numpy
array (so labeling is lost). 使用此方法,每一row
都是一个numpy
数组(因此标记丢失了)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.