简体   繁体   English

将pandas数据框行迭代为纯文本

[英]Iterate over pandas dataframe rows as pure text

I would like to read a dataframe (which contains tweets) row by row in order to analyze the text. 我想逐行读取一个数据帧(包含推文),以分析文本。

import csv
import pandas as pd

df = pd.read_csv('tweets2.csv')
df.head()

for row in df.iterrows():
    print (row)

This code I wrote does not do the job, since the "row" also includes the index. 我写的这段代码没有用,因为“行”还包括索引。 Instead, I want the simple text, that I will process further. 相反,我需要简单的文本,我将对其进行进一步处理。

You could use df.values : 您可以使用df.values

for row in df.values:
    print(row)

Example: 例:

df = pd.DataFrame({'Col1': [1, 2, 3, 4, 5], 'Col2' : ['a', 'b', 'c', 'd', 'e']})

print(df)

   Col1 Col2
0     1    a
1     2    b
2     3    c
3     4    d
4     5    e

for row in df.values:
    print(row)

[1 'a']
[2 'b']
[3 'c']
[4 'd']
[5 'e']

iterrows yields (index, Series) pairs iterrows产量(index, Series)

So you could unpack them in the for loop: 因此,您可以在for循环中解压缩它们:

for i, row in df.iterrows():
    print(row)

If you don't use the i , you should change it to _ . 如果不使用i ,则应将其更改为_

Using iterrows , each row is a Series . 使用iterrows ,每row是一个Series As shown by @cᴏʟᴅsᴘᴇᴇᴅ, an alternative is using values : 如@cᴏʟᴅsᴘᴇᴇᴅ所示,另一种方法是使用values

for row in df.values:
    print(row)

With this method each row is a numpy array (so labeling is lost). 使用此方法,每一row都是一个numpy数组(因此标记丢失了)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM