将pandas数据框行迭代为纯文本

Question

I would like to read a dataframe (which contains tweets) row by row in order to analyze the text. 我想逐行读取一个数据帧（包含推文），以分析文本。

import csv
import pandas as pd

df = pd.read_csv('tweets2.csv')
df.head()

for row in df.iterrows():
    print (row)

This code I wrote does not do the job, since the "row" also includes the index. 我写的这段代码没有用，因为“行”还包括索引。 Instead, I want the simple text, that I will process further. 相反，我需要简单的文本，我将对其进行进一步处理。

Answer 1

You could use df.values : 您可以使用df.values ：

for row in df.values:
    print(row)

Example: 例：

df = pd.DataFrame({'Col1': [1, 2, 3, 4, 5], 'Col2' : ['a', 'b', 'c', 'd', 'e']})

print(df)

   Col1 Col2
0     1    a
1     2    b
2     3    c
3     4    d
4     5    e

for row in df.values:
    print(row)

[1 'a']
[2 'b']
[3 'c']
[4 'd']
[5 'e']

Answer 2

iterrows yields (index, Series) pairs iterrows产量(index, Series)对

So you could unpack them in the for loop: 因此，您可以在for循环中解压缩它们：

for i, row in df.iterrows():
    print(row)

If you don't use the i , you should change it to _ . 如果不使用i ，则应将其更改为_ 。

Using iterrows , each row is a Series . 使用iterrows ，每row是一个Series 。 As shown by @cᴏʟᴅsᴘᴇᴇᴅ, an alternative is using values : 如@cᴏʟᴅsᴘᴇᴇᴅ所示，另一种方法是使用values ：

for row in df.values:
    print(row)

With this method each row is a numpy array (so labeling is lost). 使用此方法，每一row都是一个numpy数组（因此标记丢失了）。

将pandas数据框行迭代为纯文本

问题描述

2 个解决方案

解决方案1
1 已采纳 2017-07-24 08:39:53

解决方案2
0 2017-07-24 08:41:49

将pandas数据框行迭代为纯文本

问题描述

2 个解决方案

解决方案1 1 已采纳 2017-07-24 08:39:53

解决方案2 0 2017-07-24 08:41:49

解决方案1
1 已采纳 2017-07-24 08:39:53

解决方案2
0 2017-07-24 08:41:49