For a dictionary:
d = {
"a": [1],
"b": 2,
"c": [[7, 8, 9], ["a", "b", "c"], [9, 10, 11]],
"d": None,
}
I want to achieve this dataframe:
pd.DataFrame({"a": [[1], [1], [1]],
"b": [2, 2, 2],
"c": [[7, 8, 9], ["a", "b", "c"], [9, 10, 11]],
"d": [None, None, None]})
a b c d
0 [1] 2 [7, 8, 9] None
1 [1] 2 [a, b, c] None
2 [1] 2 [9, 10, 11] None
Basically, the columns should duplicate itself until the length of the longest column.
I know in R if i create a dataframe like with NA
to indicate the rows i want to duplicate and use tidyr::fill
, is there something similar in python?
df = data.frame(
a = c("a", NA, NA),
b = c(1, 2, 3)
)
tidyr::fill(df, a)
a b
1 a 1
2 a 2
3 a 3
Here is an example of possible solution:
d = {
"a": [1],
"b": 2,
"c": [[7, 8, 9], ["a", "b", "c"], [9, 10, 11]],
"d": None,
}
max_len = max(len(l) if isinstance(l, list) else 1 for l in d.values())
for key in d.keys():
if isinstance(d[key], list):
if len(d[key]) != max_len:
d[key] = np.repeat(d[key], max_len).tolist()
else:
d[key] = np.repeat(np.array(d[key]), max_len).tolist()
Result:
{
'a': [1, 1, 1],
'b': [2, 2, 2],
'c': [[7, 8, 9], ['a', 'b', 'c'], [9, 10, 11]],
'd': [None, None, None]
}
But it will work obviously only for a particular case, when all column but one have only one element. To solve this task generally one should also specify how columns of different length should be handled: should the whole column be repeated and rest trimmed on the last iteration, or should only first / last value be repeated, or some other approach.
It is easy to do with datar
>>> from datar.tibble import tibble
>>> from datar.base import NA, c
>>> from datar.tidyr import fill
>>>
>>> d = {
... "a": [[1]], # in order to get [1] as element
... "b": 2,
... "c": [[7, 8, 9], ["a", "b", "c"], [9, 10, 11]],
... "d": [None],
... }
>>>
>>> df = tibble(d)
>>> df
a b c d
0 [1] 2 [7, 8, 9] None
1 [1] 2 [a, b, c] None
2 [1] 2 [9, 10, 11] None
>>> df = tibble(
... a = c("a", NA, NA),
... b = c(1, 2, 3)
... )
>>>
>>> fill(df, "a")
a b
0 a 1
1 a 2
2 a 3
I am the author of the package. Feel free to submit issues if you have any questions.
Your R code can pretty much be translated into python. Its unclear if you are able to change the dictionary to a similar format as your R example, but if you can:
d = {
"a": [[1], None, None],
"b": [2, None, None],
"c": [[7, 8, 9], ["a", "b", "c"], [9, 10, 11]],
"d": [None, None, None],
}
pd.DataFrame(d).ffill()
returns
a b c d
0 [1] 2.0 [7, 8, 9] None
1 [1] 2.0 [a, b, c] None
2 [1] 2.0 [9, 10, 11] None
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.