简体   繁体   中英

find the biggest string in each row in a Pandas DataFrame

I am new to Pandas and I am trying to get the biggest string for every row in a DataFrame.

import pandas as pd
import sqlite3
authors = pd.read_sql('select * from authors')

authors['name']
...
12       KRISHNAN RAJALAKSHMI
13                        J O
14                      TSIPE
15                    NURRIZA
16                HATICE OZEL
17                   D ROMERO
18                  LLIBERTAT
19                        E F
20               JASMEET KAUR
...

What I expect is to get back the biggest string in each authors['name'] row:

...
12                RAJALAKSHMI
13                          J
14                      TSIPE
15                    NURRIZA
16                     HATICE
17                     ROMERO
18                  LLIBERTAT
19                          E
20                    JASMEET
...

I tried to split the string by spaces and apply(max) but it's not working. It seems that pandas is not applying max to each row.

authors['name'].str.split().apply(max)

# or
authors['name'].str.split().apply(lambda x: max(x))

# or

def get_max(x):
    y = max(x)
    print (y) # y is the biggest string in each row
    return y
authors['name'].str.split().apply(get_max)

# Still results in:

...
12       KRISHNAN RAJALAKSHMI
13                        J O
14                      TSIPE
15                    NURRIZA
16                HATICE OZEL
17                   D ROMERO
18                  LLIBERTAT
19                        E F
20               JASMEET KAUR
...

When you tell pandas to apply max to the split series, it doesn't know what it should be maximizing. You might instead try something like

authors['name'].apply(lambda x: max(x.split(), key=len))

For each row, this will create an array of the substrings, and return the largest string, using the string length as the key.

Also note that while authors['name'].apply(lambda x: max(x.split())) works without having to specify the key=len for max, authors['name'].str.split().max() does not work, since max() is a pandas dataframe method that is specifically built to get the maximum value of a numeric column, not the maximum length string of each split row.

You are not replacing its values...

Try this function:

def getName(df):
    df[0] = df[0].apply(lambda x: max(x.split(), key=len))

And then you just have to call:

getName(authors)

Note that I reassign each value of df[0] in this code.

Output:

    names
0   RAJALAKSHMI
1   O
2   TSIPE
3   NURRIZA
4   HATICE
5   ROMERO
6   LLIBERTAT
7   F
8   JASMEET

The main problem in your code is that you weren't reassigning the values in each row.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM